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As long ago as 1863 [23] Florence Nightingale’ proposed the tabu- 
lation of ‘‘seven elements which are required to enable us to tabulate 
the results of hospital experience. The primary object of these tables 
was to obtain a uniform record of facts from which to deduce statis- 
tical results among which the following may be mentioned: 


“1, The total sick population—i.e., the number of beds constantly 
_oceupied during the year by each disease for each age and sex. 

“9. The number of cases of each age, sex, and disease submitted to 
(medical or surgical) treatment during the year. 

“<3. The average duration in days and parts of a day of each dis- 
ease for each sex and age. 

“4. The mortality from each disease for each sex and age. 

“5, The annual proportion of recoveries to beds oceupied and to 
cases treated for each age, sex, and disease.’’ 


REASONS FOR RECENT INTEREST 


Today, nearly a century later, the barest beginning has been made 

_ along these lines and there are still many hospitals which would experi- 
ence considerable difficulty in ‘‘deducing”’ all of the statistical results 

_ desired by Miss Nightingale. The seed had been planted but germina- 
tion, or at any rate growth, has been slow indeed. From the stand- 


* From the Department of Preventive Medicine and Public Health, Vanderbilt Uni- 
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1For an account of Miss Nightingale as a statistician see reference [20]. 


point of identifying the elements which must exist in the environment 
for the healthy development of the statistical plant, it may be enlight- 
ening to inquire as to the reasons for this lag and for the recent up- 
surge of interest in this problem. 

It would appear that a realization of the fact that the hospital ‘‘ex- 
ercises a powerful influence on the pattern of medical and health care 
in acommunity’’ [7] is an essential. Steadily rising hospital costs per 
patient day coupled with rising admission rates [11] have focused at- 
tention on the need for more factual data to define the extent and na- 
ture of the problems which the hospitals are facing. Perhaps it is not 
too great an exaggeration to say that hospitals in some cases are like a 
health department which attempts to develop a sound program with 
no more knowledge of the health problems of the community than could 
be gained from the crude death rate. The problems which the hos- 
pitals face are population problems. Only by a much more intimate 
knowledge of the medical care problems of the community from the 
consumer’s as well as the producer’s standpoint is it likely that the 
hospital will be in a position to evaluate the future demands which may 
be made upon it and-to discharge efficiently its functions as a com- 
munity agency. Gradually the realization of the need for specific 
rates in studying these problems is having its influence in the develop- 
ment and use of statistical practices in hospital work. 

Another force which has operated to produce a more widespread 
interest in hospital statistics than existed at the time Miss Nightingale 
wrote her ‘‘Notes’’ has been the increasingly quantitative nature of 
medicine. Writing in 1921, Pearl [25] has pointed out that ‘‘the 
entire history of medicine shows that there has been almost from the 
first an earnest desire and effort on the part of some of its leaders to 
develop quantitative modes of thought and methods of work. The 
large measure of progress which has been made in this direction is 
sufficiently evidenced by the number of items of diagnostic and clinical 
significance which are measured and recorded in quantitative terms.’’ 
If this was true then, how much more so it must be now. If further 
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evidence is needed, one has only to note the increasing number of med- 
ical schools with teaching hospitals attached which have incorporated 
courses in medical statistics in the curriculum [26]. In addition, 
therefore, to much greater production of quantitative data the medical 
man is being trained to think in terms of groups as well as in terms of 
the individual, all of which produces a favorable environment for the 
development of hospital statistics. 

The lack of mechanical equipment for handling large masses of 
data probably has had a great deal to do with the tardy growth of 
statistics in the hospital field. It has really only been since the first 
World War that punch card equipment has come into widespread use. 
In the first of several papers from the newly organized statistical de- 
partment of the Johns Hopkins Hospital, Raymond Pearl [25] de- 
scribed the general principles of the application of punch card tech- 
niques to hospital data. Since that time many other hospitals have 
applied these principles, some examples being the Mayo Clinic, the 
University Hospital at Ann Arbor, Michigan, the Columbia Presby- 
terian Hospital in New York City, the Massachusetts General Hospital, 
Vanderbilt University Hospital and many others. It should not be 
thought, however, that the punched card method is confined to large 
hospitals. Small hospitals can punch cards and arrange to have them 
tabulated elsewhere. -Or, as we shall see later, several hospitals may 
engage in a cooperative arrangement. 


BASIC ELEMENTS NEEDED FOR HOSPITAL STATISTICS 


With the preceding discussion as a background, the basic elements 
needed for hospital statistics may be considered. The primary source 
of data is the hospital record of the patient. A hospital record’s pri- 
mary function is to help in the treatment of the patient. It is so inti- 
mately tied to the individual, however, that many of its research and 
study uses [13] may be overlooked because the emphasis in such work 
is on the mass of records and on the group of individuals they represent. 

Hyen with respect to the group, hospital records have certain pecu- 
liarities not found in other types of records [8]. A hospital record 
has to take care of a tremendous variety of things. It is not directed 
toward obtaining information on one particular phase of the situation, 
such as a survey record usually is or an immunization record might be. 
Also the record is made out and used by a great many different people. 
Henee, it must be fairly simple and capable of being readily taught. 
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One of the greatest drawbacks to the statistical exploitation of the 
data in the hospital histories lies in the fact that emphasis in history 
taking is upon the positive findings. All too often negative findings 
are not recorded. On the other hand, the narrative form of the usual 
history makes it almost impossible for the physician to take the time 
to record all negative findings. This difficulty has been recognized for 
a long time and the remedy which has been proposed is to so design the 
form that a series of definite questions are asked for the items which 
are to form a part of every history [14, 25]. For other items whose 
appearance in the history depends upon the circumstances surrounding 
the individual ease, blank space is left for the physician to record his 
observations in whatever form he pleases. Pearl [25] discusses at 
some length the objections which are likely to be raised to such a form 
and disposes of them in trenchant fashion. Dunn and Rockwood [15] 
designed a form of the type mentioned but so far as is known it has 
never been put to widespread use. 

Why has it been so difficult to place into operation such forms 
which have been developed on sound principles? As has been pointed 
out above, it has only been very recently that the medical profession 
has been ‘‘statistically’’ conditioned to the point that the validity of 
the arguments advanced by Pearl and by Dunn and Rockwood may be 
appreciated by the medical man who writes the histories. There is also 
something of a problem in pedagogy here. Oftentimes, tabulations of 
records solely for the purpose of demonstrating their incompleteness 
have a very salutary effect upon future record keeping and result in a 
request from the medical man for a record designed along the principles 
exemplified in Dunn and Rockwood’s forms. 

The adoption of the unit record system has also aided in the devel- 
opment of the statistical uses of the records. This system provides 
that the first time a patient appears at the hospital, either as an in- 
patient or an out-patient he is assigned a number. This number re- 
mains unchanged no matter how often or over how long a period of 
time he continues to appear at the hospital and all observations which 
are made upon him are incorporated in one record which bears this 
number. 

The significance of the unit medical record is well expressed by 
Kurtz [21]. To her ‘‘it is... the practical expression of a funda- 
mental medical concept, that the individual—not some part of him or 
some episode in his history, but the whole individual—is the wnit of 
medical practice and study.’’ Until this concept was introduced into 
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the organization of medical records any attempt to bring together all 
the cases of a given condition or to engage in follow-up studies was 
almost a hopeless task, and statistical practice in relation to hospital 
records suffered accordingly. Most hospitals at the present time have 
adopted the unit record system, and new statistical uses are constantly 
being found for the data in the records. 

Broadly speaking, the statistical uses to which the information in 
the hospital may be put can be categorized in terms of the universe 
of discourse to which they refer [10]. We may study, for example, the 
age distribution of individuals appearing in the syphilis clinic and 
diagnosed as having syphilis. But the generalization of such a study 
to all individuals appearing at the hospital or to the population at 
large would be totally unwarranted. The statistical universe here is 
limited to a particular kind of case appearing in the syphilis clinic. 
This is the type of universe from which most of the studies which are 
done by clinicians are drawn. 

We may also be interested in the prevalence of various kinds of 
disease conditions among the different individuals who come to the 
hospital. We are then dealing with the hospital universe. Studies of 
data drawn from this universe have very definite uses [13], but such 
data will give us no idea of what the prevalence of these conditions is in 
the community. The difficulties which lie in such generalizations are 
well illustrated by Berkson [8] in his discussion of the limitations of the 
application of 4-fold table analysis to hospital data. 

A third universe which must be recognized in relation to hospital 
statistics is the population from which the individuals coming to the 
hospital are drawn, or, to put it another way, ‘‘the population which 
the hospital is serving’’ [10]. For any one hospital, this is usually an 
extremely difficult population to define quantitatively. The task is 
somewhat simplified when all the hospitals in a community are con- 
sidered. 


CLASSIFICATION PROBLEMS 


Attempts in the past to study samples drawn from these different 
universes have resulted in the recognition of certain technical statis- 
tical problems which needed to be solved before any great amount of 
headway could be made in the analysis of hospital data. Most of these 
are problems of classification. 

Consider, for example, the clinician who wishes to work with a 
sample drawn from the ‘‘case’’ universe. He wishes to get all the 
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cases of a given kind together. It is important for such a study that 
his colleagues use the same term for indications of the same set of con- 
ditions. It is especially important if any attempt is to be made to 
combine the statistics of several hospitals. The experience of the Hos- 
pital Discharge Study [18] of New York City serves to illustrate the 
point. ‘‘The greatest difficulties were met in the terminology of 
cardiovascular conditions. Here more than anywhere else, symptoms, ~ 
such as ‘enlargement of the heart’ or ‘cardiac failure’ replaced diag- 
noses. Vague notations—were so very frequent that there was no 
basis for further classification among the 25,758 cases reported with 
heart disease.’’ This difficulty has long been recognized and the need 
for a standardized nomenclature of diseases was. felt by Florence 
Nightingale when she sought the aid of Doctor Farr in the development 
of a morbidity list for use in hospitals. Massachusetts General Hos- 
pital and Bellevue Hospital in New York City developed their own 
nomenclatures to meet this need around the turn of the century and 
these nomenclatures were adopted by many other hospitals in the 
country. ' By 1928 a number of other nomenclatures were also in use 
and the need for a unification of these nomenclatures resulted in the 
National Conference on Nomenclature of Disease which brought out 
a Standard Classified Nomenclature of Disease. The publication of 
this Standard Nomenclature has since been taken over by the American 
Medical Association [22] and by 1935 the Nomenclature had been — 
adopted by nearly 500 hospitals in the United States and Canada. 

A nomenclature, however, is only the first step in the study of 
problems related to the ‘‘hospital’’ universe or to the population which ~ 
the hospital serves. ‘‘The function of a nomenclature is to train the 
physician to use the clearest and most acceptable diagnostic terms to 
describe a particular clinical case.’’ <A classification of disease for 
statistical purposes is also needed to group the thousands of medical 
diagnoses so that they may be presented in meaningful tabular form 
and so that with the aid of an alphabetical index ‘‘a reasonably intelli- 
gent diagnosis coder may assign diagnostic statements to the various 
categories of the list as accurately as is possible from the stated causes 
of illness’? [24]. In 1936 Berkson [1, 2] presented such a classifica-— 
tion and a plan for meeting, by means of one punch card, the needs of 
the clinician who wishes to get all the cases of a particular diagnosis — 
and, also, the statistical requirements of periodic statistical summaries 
of medical conditions. This scheme is an ingenious use of a punch 
card code consisting, for the diagnostic items, essentially of a main 
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number and a sub-number, the main number serving the statistical 
needs and the sub-number permitting the identification of cases of a 
particular diagnosis. The punch card is also designed to permit writ- 
ing the diagnoses directly on the card and also the coding of the 
diagnoses. 

_ The International List of Causes of Death served as the basis for 
Berkson’s tabular outline for the classification of disease terms because 
it ‘‘has the widest current use for purposes of statistical enumeration.’’ 
The Welfare Council of New York also evolved a Classified List of 
Diagnoses in connection with its Hospital Discharge Study and this 
has been reported by Jeter and Fraenkel [19] and by Fraenkel [16]. 
As a result of experience gained with these lists and of the experience 
of other groups a committee of consultants appointed by the Surgeon 
General of the United States Public Health Service evolved a more 
general ‘‘Diagnosis Code for Tabulating Morbidity Statisties’’ which 
appeared in 1940 [24]. 

At the Fifth International Conference for the Revision of the 
International List of Causes of Death held in Paris in 1938, it was 
recommended that ‘‘the United States Government continue its studies 
of the statistical treatment of joint causes of death’’. In compliance 
with this resolution the Secretary of State appointed a United States 
Committee on Joint Causes of Death and this committee ‘‘decided that 
before taking up the matter of joint causes it would be advantageous 
to consider classification of disease from the point of view of morbidity 
and mortality since the joint cause problem pertains to both types of 
statistics’’. Accordingly it appointed a subcommittee to draft a 
“classification that might be used for the statistical coding of both 
records of illness and causes of death’’. The labors of this group 
brought forth in April 1946 a Proposed Statistical Classification of 
Diseases, Injuries and Causes of Death [27] which was adopted by the 
U. S. Committee on Joint Causes of Death. After a number of fur- 
ther corrections the parent committee turned over the Proposed Classi- 
fication to the Interim Commission of the World Health Organization 
in March 1947 at the request of the latter. The introduction to the 
Proposed Classification contains an excellent history of the develop- 
ment of classifications of disease. The Proposed Classification itself 
is perhaps as great a step forward as the original development of the 
International List of Causes of Death. It is to be hoped it will be as 
widely adopted. : 

With the development of classifications of disease, knowledge of 
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4 
the extent and causes of illness in the population also began to grow. 
Most of this knowledge is the result of special surveys of one type or 
another except for such information as is furnished by the reporting of 
certain notifiable diseases. It has already been indicated in the open-— 
ing paragraph of this review that the idea of combining the statistics 
of several hospitals to furnish morbidity data is not a new one. The 
universe to which such data apply, however, must be clearly recognized © 
and the evolution of such statistics mirrors the constant striving to_ 
enlarge the universe to as much of the population at large as possible. — 


STUDIES OF HOSPITAL MORBIDITY 


In 1913 Frederick L. Hoffman presented the statistics of the Johns - 
Hopkins Hospital from 1892 to 1911 [17]. This monograph was an 
excellent demonstration of what could be done in studying the records i 
of one hospital. The universe to which it applied, however, is the 
“‘hospital universe’’ only. In the same year Bolduan [4] proposed 
the adoption of a plan for collecting morbidity statistics from all the | 
hospitals in a community utilizing a procedure similar to that by which - 
data on death certificates were compiled. Essentially his idea was to | 
have a ‘‘discharge certificate’’ sent to a central collecting agency each 
time a patient was discharged from the hospital. The central agency - 
would then code and tabulate the data on the certificate. The greatest _ 
deterrent to the adoption of this scheme was the difficulties which were — 
encountered in classifying the data in the different hospitals. How- 
ever, this plan served as the basis for a study of a sample of 21,000 
patients in six hospitals in New York City in 1923. Its principle was” 
also adopted in the Hospital Discharge Study of the Welfare Council of — 
New York City published in 1942 in which an analysis was made of 
over half a million patients discharged from the hospitals of the city. 
Insofar as this material covered most of the hospital facilities in the 
city, to that extent can rates based upon the population of the city be 
considered a reflection of the hospitalized illness which occurred in the 
city as a whole. It must be remembered, however, that this does not 
give a picture of total morbidity because only the more serious illnesses 
are hospitalized. As has been pointed out above, this study was espe- 
cially valuable in the development of a morbidity classification for use 
in problems of this kind. 

In 1939 Crosby [9] presented the results of a study of rural hos- 

. pital morbidity. It was emphasized that the data applied to the ‘‘hos- 
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pital universe’’ only, but this study did for a rural hospital what Hoff- 
“tmman’s study did for the Johns Hopkins Hospital. 

: The studies mentioned up to this point have all started with a 
study of patients who had already been in the hospital. But in 1941 
[11] and 1943 [12] Councell presented the results of a study which 
combines ‘‘the use of survey and hospital data in that it starts with a 
survey group and proceeds to hospital sources of information for these 
persons’’. It has been pointed out that the population which a hos- 
pital serves is difficult to define. Here the idea of taking a known 
population and finding out what hospital service it receives has been 
explored. The recent growth of hospital service plans, such as the 
Blue Cross, is making a wealth of material available for study of these 
problems. The reports of Van Dyk [27] and Colman [5, 6] show what 
can be done with these data. Here again, ‘‘although these records per- 
tain to a well-defined population, the rates are biased in that the per- 
sons studied are a selected group. Admission figures are higher than 
in the general population because, by the prepayment of fees, the eco- 
nomic barrier to hospitalization is considerably broken down.’’ One still 
largely untapped source of morbidity statistics is the records of out- 
patients. 

It has been shown how in recent years many of the obstacles to the 
development of hospital statistics have been broken down. Classifica- 
tions of disease have been developed which permit more ready combi- 
nation of the statistics of many hospitals, mechanical procedures for 
handling large masses of data have been perfected, and the medical 
man and medical environment have become more and more statistically 
minded and appreciative of the requirements of hospital statistics. 
Much still needs to be done, but hospital statistics are on the threshold 
of a tremendous development and ‘‘hospital horizons are expanding’’ 
[7]. It is to be hoped that the need for more trained personnel in this 
area will be recognized by those young people who are seeking fields 
in which they may pursue a promising career. 
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In previous papers [3] [4] a ranking method has been described 
for the rapid approximate determination of the significance of differ- 
ences between two treatments, when the experimental data consists of 
unpaired replications, paired replications, or replications occurring in 
two or more groups. Brief tables were given which furnished the rank 
totals for the 0.05%, 0.02% and 0.01% level of significance. These 
tables were prepared by making use of certain properties of the par- 
titions of numbers, but the method becomes impractical with larger 
numbers of replicates. A recent paper by Mann and Whitney [1] 
describes a similar test and gives probability tables covering the range 
from 3 to 8 replicates per treatment, including the case of unequal 
numbers of unpaired replicates under the two treatments. These au- 
thors tabulated the probabilities against the serial number of possible 
rank totals, 0-1-2 ....U, instead of the rank total itself. Their tables 
give probabilities for one tail of the distribution only. 


UNPAIRED REPLICATES 


Let NV be the number of replicates, then 2N will be the number of 
rank scores from 1 to 2N to be assigned to the data. If the two treat- 
ments do not differ, then the total score for one of the treatments in 
repeated experiments will be distributed around the expected total va 
with variance equal to N7'/6. This may be shown in the following 
manner: 

The variance of a single item from the rectangular population 1 to 
2N is (4N?-1)/12 [2]. The variance S? of a total of N items would 
be V(4N?-1)/12, except for the fact that no rank number can occur 
more than once; in other words it is a drawing ‘‘ without replacement.’’ 
This reduces the variance by a factor 1—-(N-1)/(2N-1), or N 
/(2N-1). The corrected variance is V?(4N?—1)/12(2N-1). If the 
corrected variance is divided by the expected total 7’ which may be 
written 2N (2N+1)/4 we obtain N/6 for the ratio 82/7, and hence 
8?=NT/6. 

The distribution of rank totals obtained by this procedure are suffi- 
ciently close to normal so that the totals corresponding to the 0.05%, 
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0.02%, and 0.01% levels may be closely approximated by means of the 
following formulae: : : 
os=T-1.960 WNT /6 
T o2=T-2.326 NNT/6 
T 1 =T-2576 NNT/6 
In the case of the 0.001% level, however, the approximation is not 
sufficiently close to be useful. 


PAIRED REPLICATES 


In this ease N rank numbers are assigned to the differences between 
pairs, and each rank number is given a sign corresponding to the sign 
of the difference. The expected total of one sign, 7, is N(N +1) /A4, 
and the variance of the total is 2N7'/6. The totals corresponding to 
the 0.05, 0.02, and 0.01% level of significance may be calculated as 


indicated previously in the case of unpaired replicates: 
T os=T-1.960 \2NT/6 
T o=T-2.326 \2NT/6 
T 1 =T-2516 \2NT/6 


GROUPED DATA 


Frequently the experimental results fall into two or more groups, 
with two or more replicates per group, as for example when compari- 
sons are carried out at different concentrations, times, localities, or 
temperatures. If we consider one of the n groups, and assign rank 
numbers to the 2N results within a group in the manner described 
under unpaired replicates, the expected total, 7, for a treatment in one 
group is 2V(2N +1) /4 and the variance is N7/6. The expected total 
for n groups each with N replicates is »7’, and the variance of the total 
is nNT'/6 since the totals are additive and the variances also. The 
formulae for calculating the total corresponding to the 0.05%, 0.02%, 
0.01% level are: 


T os =nT —1.960 VnNT/6 
T oo = nT —2.326 NnNT/6 


= nT —2576 NaNT/6 


where is the number of groups, NW the number of replicates per group, 
and T is the expected total for one group. 
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TABLE I 


UNPAIRED REPLICATES 
Probability (P) of chance occurrence of a rank total equal to or less than T 
with N replicates. T is given in body of table, to nearest whole number. 


N P= 0:05. P=0.02 P= 0.048 
5 18 16 15 
6 27 24 23 
ul 37 34 32 
8 49 46 43 
9 63 59 56 
10 79 74 71 
11 97 91 87 
12 116 110 105 
13 137 130 125 
14 160 152 147 
15 185 176 170 
16 212 202 196 
17 241 230 223 
18 271 259 252 
19 303 291 282 
20 338 324 315 
TABLE II 


PAIRED REPLICATES 
Probability (P) of a chance occurrence of rank total of one sign, + or -, 
whichever is least, equal to or less than 7. T is given in body of table, to nearest 
whole number. WN is number of replicates, 


N P= 0.05; P=0.02 P=0.01 
6 1 0 
7 2 i ae the mA See 
8 4 2 0 
9 6 3 2 
10 8 5 3 
11 if 7 5 
12 14 10 iti 
1183 18 13 9 
14 22 16 12 
15 26 20 15 
16 3b 24 19 
4 17 36 28 23 
18 41 33 27 
19 47 38 32 
20 53 43 37 


With unequal numbers of replicates in the different groups the ex- 
pected totals and variances must be calculated separately and added 
together, to obtain the final expected total and its variance. 
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TABLE III 


GROUPED DATA 
Probability P of chance occurrence of a rank ited equal to or less than T 


with N replicates in n groups:— 


n=? n=3 n=4 N=5 n=6 n= 7 


HAN bog 72 EGP. Tee. PEEP. f° = Ee oe eee 


{1] 


[2] 
[3] 
[4] 


“Ai 0.01 9 0.01 13 0.01 17 (0.01 22 0.01 26 0.01 
wt O02 100 10502 14 0.02 18 0.02 23 0.02 27 0.02 

6 0.05 11 0.05 15 0.05 19 0.05 24 0.05 28 0.05 
13 0.01 21 0.01 30 0.01 39 0.01 49 0.01 58 0.01 
14 0.02 22 0.02 31 0.02 41 0.02 50 0.02 59 0.02 
15 0.05 24 0.05 33 0.05 42 0.05 52 0.05 62 0.05 


24 0.01 39 0.01 54 0.01 70 0.01 86 0.01 102 0.01: 


25 0.02 40 0.02 56 0.02 72 0.02 88 0.02 105 0.02 
26 0.05 42 0.05 58 0.05 75 0.05 91 0.05 108 0.05 


38 0.01 61 0.01 85 0.01 110 001 1385 0.01 160 0.01 
39 0.02 63 0.02 88 0.02 112 0.02 138 0.02 163 0.02 
42 0.05 66 0.05 91 0.05 116 0.05 142 0.05 168 0.05 
55 0.01 89 0.01 124 0.01 159 0.01 195 0.01 231 0.01 
57 0.02 92 0.02 127 0.02 162 0.02 198 0.02 235 0.02 
61 0.05 96 0.05 131 0.05 168 0.05 204 0.05 241 0.05 
77 0.01 123 0.01 170 0.01% 218 0.01 266 0.01 314 0.01 
79 0.02 126 0.02 174 0.02 222 0.02 270 0.02 319 0.02 


83 0.05 131 0.05 179 0.05 228 0.05 277 0.05 327 0.05 
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TREATMENT OF THE FOURFOLD TABLE BY PARTIAL 
ASSOCIATION AND PARTIAL CORRELATION AS IT 
RELATES TO PUBLIC HEALTH PROBLEMS 


Hersert L. Lomparp, M.D., anp Cari R. Dozmrtne, M.D. 


Massachusetts Department of Public Health and the Harvard School of Public 
Health 


Many public health studies require treatment of plural variables 
and the bulk of data which is available falls into a dichotomous classi- 
fication. The following are examples of studies which have required 
analysis by a technique applicable to alternative categories: 

(a) An estimate of the risk of operative mortality for cancer 
patients was desired [1]. Obesity, malnutrition, old age, hyperten- 
sion, cardiac history and length of operation were the variables studied. 
Many of these were not quantitatively measurable, and for consistency 
the few that were measurable were consolidated into alternative cate- 
gories. 

(b) In another study aimed to determine the amount of cancer 
knowledge in relation to age and economic factors, the age could have 
been measured quantitatively, but the economic condition could not [2]. 

(c) At present a study is being conducted in which approximately 
80 variables dealing with heredity and environment as possible causa- 
tive factors of cancer are being considered. Over three-quarters of 
these variables cannot be measured quantitatively and the solution of 
the many problems inherent in this study depends on the treatment 
of the 2 x 2 table. 

If all variables in a study were independent, the examination of 
the data in the zero order of association would be sufficient. In the 
presence of data requiring the analysis of plural dependent variables, 
the two methods to be considered are partial association and partial 
correlation. The one is the effort to eliminate the fallacy of mixed 
classification by using partial universes; the other is the effort to 
eliminate the effect of certain variables in the whole universe, so as 
to be free of the fallacy of mixed classification. 

With a satisfactory coefficient of correlation the method of partial 
correlation could be used. This would be of great advantage in those 
studies in which the data are not sufficiently numerous to enable sub- 
division of all the variables. If the data are numerous enough, par- 
tial association could be used. This is the association between x and y 
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in sub-universes. The more variables to be considered, the greater will 
be the number of sub-universes to be studied. The elimination of the 
effect of one variable requires two sub-universes; two variables, four 
sub-universes ; three variables, eight; four variables, sixteen, et cetera. 

No additional computations are necessary when all partial uni- 
verses show either direct or inverse association. When the values of 
the individual subdivisions are not significant, or where some are 
significant and others are not, it is necessary to use some method that 
will determine the significance of the subdivisions in the aggregate. 
A generally accepted method is to sum all the chi squares to form a 
single value and to look up the probability in the table of chi square 
with m the number of individual chi square values [3]. We have used 
this method in those instances in which all the subdivisions had either 
all positive or all negative associations. Unfortunately this rarely 
occurs if several variables are to be eliminated. 

In recent public health studies [1, 2] the significance of the sub- 
divisions in the aggregate has been determined by a slight modification 
of the method suggested by Professor E. B. Wilson of Harvard Uni- 


versity. This consisted of obtaining es for each fourfold table com- 
Co 


prising the subdivisions, averaging these unit deviates and determining 


significance As the standard error of the normal curve is 


M 
“oM 

1 for th f th If th He 
unity, me used for the standard error of the mean. t mM 


equalled or exceeded 2.6, significance was assumed. While ue was 
o 


usually the computed , in those studies where the figures 


a(Pi—P2) 
in the 2 x 2 tables were small, chi squares were computed using Yates’ 
correction, the square roots of the various chi squares were extracted, 
P,-P2 
o(Pi- Pz) 
method the benefit of Yates’ correction for small numbers was obtained 


and the sign adopted as if had been computed. By this 


and the value of 2 was slightly smaller. 


This is an excellent method if a considerable amount of data is 
available and if it is desired to eliminate the effect of several variables. 
The population in each subdivision should not be too small, and there 
should be enough subdivisions to give stability to the mean. In those 
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cases in which the size of the sample will not warrant subdividing the 
data into as many orders as desirable, or where only one or two varia- 
bles are to be eliminated, some other method of analysis seems neces- 
sary. . 

Various coefficients of association for the 2 x 2 table have been used, 
but there are only two correlation coefficients: the tetrachorie r and 
the product-moment correlation coefficient described by Yule and 


TABLE II 
EXAMPLE OF THE PRODUCT-MOMENT CORRELATION COEFFICIENT For A 2x2 TABLE* 
; Good Poor NS 
| I Ya Neg en 
eee” Score eae (A) (a) (B) (B) 
(um. oe 
Lectures 75 62 137 EXE) N 
680 x 137 
65 10. ieee 
1762 
No Lectures 605 1020 1625 
22.2 x 1762 
"= "7680 x 1082 x 1625x137 
680 1082 1762* = 096 
The other zero corelations are: 
Good score and radio 118 "a Perce 
Good score and solid reading 296 a 
Good score and newspapers 246 1 
Lectures and radio 110 CP IE 
Lectures and solid reading 117; V1729-3-3 
Lectures and newspapers -106 
Radio and solid reading .109 2-209 
Solid reading and newspapers 380 Oz 
Radio and newspapers 202 


* 1763 records collected. 
1729 records had information on all variables. 


Kendall [3]. In using the tetrachorie method of correlation, the — 
underlying assumption that the variables under consideration are 
really continuous and normal in distribution must be borne in mind, 
as well as the fact that the data for each trait have been forced into 
two alternative categories. 

At the beginning of the heredity and environment study the tetra- _ 
choric coefficient computed from the diagrams prepared by Thurstone 
and his associates [4] was used in the hope that the important variables 
had a normal distribution. It became evident that this was not the 
ease and the coefficient was therefore abandoned. The product- 
moment correlation coefficient for the 2x2 table was tried next and — 
partial correlations computed. This was wholly empirical as it was 
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‘not known whether the coefficient could be used in partial correlations. 
After the partials were computed they were transformed into Fisher’s 
z [5], and the results reduced to unit deviates by dividing each z by 
its standard error. 

Numerous comparisons between partial correlation and partial 
association were made. The results of a few of these comparisons 


are shown in Table ITI. 
| TABLE III 


COMPARISON OF RESULTS USING PARTIAL ASSOCIATION AND PARTIAL CORRELATION 
or VARIOUS DATA 


x 
Numberof 2 ©. a 
Variables Variables M — 
Eliminated 1 Gz 
VN 
Study A—436 records 
Cancer of cervix—use of spices 2 3.96 3.92 
Cancer of cervix—irregular eating 2 4.59 4.54 
Cancer of cervix—little hair 2 5.58 6.00 
Cancer of cervix—heavy menstruation ...... 2 3.17 3.67 
Study B—493 records . 
Cancer of breast—little Waar 0... 2 4.10 5.06 
Cancer of breast—infrequent bathing ........ 2 5.87 5.90 
Study C—1729 records 
Knowledge of eaneer—radio addresses ....... 3 2.22 2.60 
Knowledge of cancer—magazines & books 3 a SERA 9.40 
Knowledge of cancer—lectures Ba 3 2.06 2.09 
Knowledge of cancer—newspapers 3 5.55 5.67 
Study D—2091 records 
Operative mortality—long operation. .......... 3 9.90 12.00 
Operative mortality—heart disease .........:... 3 2.39 3.30 
Study E—10,092 records 
Chronic disease—laxatives o.ecccsccsssessrsasmseni 5 15.70 16.10 


On the whole, agreement between results of the two methods seems 
satisfactory. Some of the comparisons showed considerable difference, 
but one would expect a few differences, some perhaps serious, due to 
chance alone. It appears that we can deal with the problem of plural 
variables by either method, and the results, considering the limitations 
of the methods, are reasonably consistent. We are satisfied when both 
methods show consistency. When the results differ as to significance, 
we feel less certain but would be inclined to favor partial correlation 
if n is small or if only one or two variables are being eliminated, and 


127 


to prefer the partial association technique if several variables are be 
eliminated and the size of the sample is sufficiently large. 


CONCLUSIONS 


(1) Public health studies frequently require the use of the four. 
fold table in the analysis of data containing plural variables. 

(2) The tetrachorice r has only limited applicability since in mos 
studies some of the variables do not have a normal distribution. 

(3) The additive property of x can be used if all subdivisions hay 
either positive or negative association. 

(4) Partial association is a satisfactory method of analysis if the 
elimination of several variables is desired and the number of cases i 
sufficient for adequate populations in the sub-universes. 

(5) Partial correlation, using the product-moment correlation 
coefficient for the 2x2 table, furnishes results sufficiently consistent 
with those of partial association as to warrant the opinion that it is a 
satisfactory method. 

(6) The choice of the method to be used depends upon the data 
available. 


{ 
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TIME-SPECIFIC LIFE TABLES CONTRASTED WITH 
OBSERVED SURVIVORSHIP* 


MARGARET MERRELL 
School of Hygiene, The Johns Hopkins University 


In obtaining life tables for any group of individuals, biological or 
physical, there are two basically different types of observation which 
are commonly used. One of these consists in the classification of 
individuals at a fixed time into separate age groups and then in the 
recording of deaths for all the various ages simultaneously. The life 
table derived from this evidence is a time-specific or static life table 
because the deaths for all the ages are recorded at the same time. The 
second type of observations consists in following a group of indi- 
viduals born at approximately the same time, from birth to the death 
of the longest-lived member of the group. In this case the survivor- 
ship curve is determined by direct observation. It has been called a 
generation or cohort or fluent life table [1]. The life tables resulting 
from these two different types of evidence on longevity will, except 
under unusual circumstances, be different in form, and in any case 
will be quite different in meaning. 

To consider an example of the two life tables, suppose we wish to 
study the present day survival of Ford automobiles. We may ask 
two different questions: what is the longevity of Fords at the current 
rates of survival, or what is the survival of the current model of Ford? 
That is, in one case we ask about current rates of cars existing at the 
present time and in the other about the present model and its future 
rates. To answer the first question we might get the date of manu- 
facture (that is the age) of all the Fords registered on January 1, 1947, 
and during the following year get, by date of manufacture, the number 
of these cars irrevocably eliminated from service, that is, the deaths. 
Death in this case would be due to accidents, old age, or perhaps dis- 
ease and malformations. From the age-specific death rates we could 
construct a survivorship curve for Fords for the year 1947. It would 
give us the way in which a group of Fords would survive if the 1947 
rates pertained throughout their careers. The other life table could 
be obtained by following the 1947 models throughout their future 
existence to determine the number finally retired from service up to 
successive dates. This would give us the survivorship curve for the 
1947 models, by direct subtraction. These two survivorship curves 

* Paper No. 233 from the Department of Biostatistics, School of Hygiene and Public 
Health, The Johns Hopkins University. 
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would be quite different and in this familiar problem everyone will 
think of a number of reasons why this is so. The cars registered on 
January 1, 1947 will be residues of different lots of cars, each remnant 
having its own peculiar ability to survive 1947 conditions, depending 
on age, basic quality, damage due to past experience, and nature of 
repair. The common element for all-of them is that their experience 
is being studied for the year 1947. On the other hand the dynamic 
life table for the 1947 models will have the experience of successive 
years of time but the survivorship curve will be that of cars manu- 
factured in the same year, and therefore will have a certain homo- 
geneity which the fixed-time life table does not possess. 

All of the factors that I have mentioned as contributing to the 
difference in the two life tables for Fords have their direct counterpart 
in biological life tables. Inherent capacity, environmental hazards, 
methods of preventing and repairing damage, are continually shifting 
to a greater or less degree in the biological as well as in the physical 
world. The individuals existing at any time are survivors out of 
different lots of individuals and a single survivorship curve deter- 
mined by putting together their experience in a given period will be 
different from the survivorship of a given lot of individuals followed 
through their lives. Thus for biological species also, the static and 
the dynamic life tables will be different in both form and meaning. 

Yet frequently the choice of life table in this field is made solely 
on the issue of convenience. For long-lived forms, like man, it is not 
convenient to follow a group of individuals from birth to death, so for 
this reason alone we sometimes employ the static life table. For 
short-lived forms, like the beetle or fruit-fly, it is often more con- 
venient to start with a given brood and follow it through life than to 
keep track of the ages and deaths among different broods. The ques- 
tion of convenience has sometimes led to the error of comparing a 
specified-time life table for man with a generation life table for some 
other biological form. The scientific question at issue is the impor- 
tant point in these problems and that life table should be set up which 
is relevant to the scientific question. 

If we turn to human life tables we can see the effect of determining 
survivorship on a fixed-time basis as compared with a generation basis. 

Figure 1 gives the survivorship of males in Massachusetts for fixed 
times approximately 10 years apart from 1890 to 1940. The 1890 
life table, for example, shows how a hypothetical group would survive 
if it died off according to the 1890 age specific rates. We can see 
from this graph the improvement in the rates from 1890 to 1940. 
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But the graph also shows the other type of life table in giving the 
urvivorship for males born in 1890. This was constructed in the follow- 
ing way. The people born in 1890 were infants in that year, 1 year olds 
1891, 2 year olds in 1892, ete. The rates of dying at these ages in 
hese calendar years are therefore the rates pertaining to this group’ 
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Figure 1. Percentage surviving from birth to successive ages, Massachusetts 
males. 


_as they pass through life. It is possible in terms of these rates to con- 
struct a survivorship curve for the 1890 cohort from birth to their 
present age of 56 years. I have carried this curve only to 50 years 
since the last life table I was able to construct on a known population 
distribution was for the census year of 1940. It is seen that the people 
born in 1890 had a survivorship curve unlike that of any of the years 


1 Although migration effects keep the composition of this group from remaining 
exactly the same in the sense of a laboratory experiment or an ideal follow-up, in a 
_ practical sense the same group is being followed. 
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through which they lived. Conversely the survivorship based on the 
rates of a given calendar year would not be the survivorship of any 
group followed throughout life. 

Which of these curves do we want? If we want to compare the 
risks of dying in different calendar years we probably want the fixed- 
time life table. Whatever the multiplicity of factors in this year and 
in the past which produced the death rates of the calendar year, these 
rates were the actual current situation and the survivorship curve 
based on them helps us to see the implications of those rates. Certain 
aspects of the improvement in mortality with time, and the comparison 
of the mortality of different places at the same time are well shown by 
such curves. On the other hand, for certain prognosis problems, such 
as have been considered by Lotka [2] on the future age structure of 
our population, chances of orphanhood, and so on, the gr prob- 
lem must certainly be considered. 

If we consider the description of survivorship for other biological 
forms, the same issues arise. The question has been‘raised of getting 
the life table for a forest. In this case it is clear that the two forms 
of life table need to be very sharply distinguished since they would be 
widely different. The fixed-time life table would show the effect on 
survivorship of a current program of selective cutting, along with the 
natural mortality of the trees. The generation life table would show 
the survivorship of the same generation of trees from seedlings to final 
death influenced by all the time changes in forest preservation or 
destruction which would accompany their life span; this would give us 
the information we would need in evaluating a large scale reforestation 
program. The two life tables would therefore supplement each other 
in providing knowledge on which to plan a sound program. 

If we consider the shorter-lived forms which are studied frequently 
in the laboratory, the contrast in the two types of life table is also great. 
The very fact of short life means that brief fluctuations in the environ- 
mental conditions may be very profound in their effect, and chance 
or deliberate alterations in environment will be different if they per- 
tain to a single generation at some stage in its life span or if they 
affect a population composed of all ages. 

I have been discussing a life table primarily in terms of a survivor- 
ship curve. I should like now to turn to a consideration of the age 
specific rates which are behind the survivorship curve and see how 
time is involved here. We study these age specific rates for their 
scientific meaning but the biological interpretation of the risk for a 
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given calendar year is very dependent on how the rates for successive 
‘generations have been changing. 

I want to illustrate this by a study on tuberculosis, made by Dr. 
Wade Frost [3]. 
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Figure 2. Age-specific death rates from tuberculosis for different calendar 
years, Massachusetts males. 


Figure 2 shows the age-specific rates for males of Massachusetts 
for deaths from all forms of tuberculosis at certain fixed periods of 
time from 1880 to 1940. It had been noted by students of the subject 
that this curve had undergone a drastic change in shape. Back in 
1880 the peak of mortality was in the 20’s and at successive periods 
of time the peak became later and later in age until in 1940 it was in 
the 60’s. Prior to Dr. Frost’s work on this, the interpretation had 
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generally been that tuberculosis was changing in its relation to ma 
that at earlier times it was most devastating to young adults, but ir 
recent years, either the disease or the resistance of people had changec: 
so that the effects of the disease had become more serious later in life: 
In other words, tuberculosis had become an old age disease. 

In order to test this explanation Dr. Frost examined the deat! 
rates for successive cohorts of people. 


TABLE I 


Ace Speciric DEATH RATES PER 100,000 FRoM TUBERCULOSIS FOR MASSACHUSETTS 
Mauss, 1880 To 1940, wirH RATES FoR CoHoRT oF 1880 INDICATED 


The columns given in Table I are the age-specific death rates in 
successive time periods which we have just seen. If now we consider 
the people who were under 10 in 1880 as one group, (called the 1880 
cohort), their death rates are in the 1880 column; they were 10 to 20 
in 1890, and their death rate is in the 1890 column; 20 to 30 in 1900, 
and so on. The diagonal stepped lines indicate the death rates for 
this group as they pass through life. Similarly the 1890 cohort may 
be followed through to age 50-59; the 1900 cohort to age 40-49, and 
so on. 

Now if we look at these cohort curves shown in Figure 3 we find 
that they all have their peak in the 20’s, that they are all approxi- 
mately the same shape, that there is no evidence at all that tuberculosis 
is becoming an old age disease. Hach successive generation has its 
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greatest toll taken in the 20’s. What then is the explanation for the 
apparent shift in the risks seen in Figure 2? It lies in the fact that 
_in successive generations the rates have come down at all ages, the 
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Figure 3. Age-specific death rates from tuberculosis for successive 10-year 
cohorts, Massachusetts males. 


form of curve remaining the same. When we get the rates at a par- 
ticular time, say 1940, we cut across the rates for all the various 
cohorts, at different points on the age scale. We have the 30-year-old 
rate from the 1910 cohort, the 40-year-old rate from the 1900 cohort, 
ete. The older rates pertain to earlier cohorts when all the rates were 
higher. Thus our 1940 curve does not represent the way tuberculosis 


135 


is affecting any single group of people in passing through their lives, 
but rather shows that the old people have come through such heavy 
exposure to the disease in their youth that there is more tuberculosis 
among them than among even our 20-year-olds today. It would not 
do to anticipate that our present 20-year-olds would show a higher 
rate in old age than they have now. Tuberculosis is still primarily a 
young person’s disease. 

It is the fact that age changes in a person are perfectly correlated 
with time changes that leads us to think of any age difference as rep- 
resenting a time flow. But age differences at a fixed time cannot be so 
interpreted. The very form in which the life table is put, that of | 
survivorship, tempts us still further to think of the curve as giving © 
us a flow of people through their lives. We must therefore distinguish | 
clearly between the case where it does really represent the survivorship 
of a group and the case where it represents in very picturesque form, a 
static set of age-specific rates. 
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PROFESSOR SNEDECOR RETIRES 


G. W. Snedecor retired July 1 as head of the Statistical Laboratory, 
Ames, Iowa. He will remain associated with the Laboratory as 
research professor during the school year, continuing his teaching and 
consulting activities. His retirement comes 14 years after the Lab- 
oratory was founded in 1933. Professor Snedecor early took the lead 
and has done more than anyone else in this country towards promoting 
the application of modern statistical methods to biological research. 
He has a real understanding of the practical aspects in experimentation 
and insists on the use of common sense along with the statistical tools. 
We wish to take this opportunity to express our sincere personal ap- 
preciation for his patient instruction, wise counsel, and friendship. 
His influence holds a prominent place in the minds and purposes of 
many of us. 
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QUERIES 
(51) 


QUERY: In connection with certain investigations into railroad prob- 
lems, it was necessary recently to determine whether the regression of 
certain series was linear or not. Fisher’s summation method was 
used. Is it in order to use this method to determine if the regression 
is linear, and also is the test given conclusive? 

Kindly state also if F' is significant after compiling the second 
stage, it is still necessary to continue to calculate the third and fourth 
stages, etc. In other words, is it possible that if F is insignificant after 
the second stage for it again to become significant subsequently, which 
will, of course, mean that while the fit of a horizontal straight line is 
better than that of the fit given by a quadratic equation, yet the fit of 
an equation of higher degree may be closer than that of the quadratic. 


ANSWER: My understanding is that you have values of Y corre- 
sponding to a set of values of X spaced at equal intervals, a single 
value of Y for each X. This may be a time-series. The summation 
method results in a set of additive mean squares, each associated with 
the coefficient of a polynomial. Some judgment as to the nature of 
the trend is arrived at by studying the graph and by relating it to the 
relative sizes of the successive mean squares. 

Tests of significance of the orthogonal coefficients depend on the 
estimation of o?, the random component of variance: this o? is the 
variance of a residual of Y, assumed normally distributed for each X 
and the same for all values of X; that is, it is independent of both 
X and Y. If my guess is correct, you have no very good method for 
estimating o?. 

Lacking this, one method of procedure is to assume that some poly- 
nomial, of fifth degree for example, will follow the population regres- 
sion with sufficient fidelity to warrant the use of the model. On this 
assumption the analysis of variances of n values of Y will take this 
form: 


Source of Variation Degrees of Freedom 


He 


Under the assumptions made, the remaining mean square with (n-6) 
degrees of freedom is an estimate of o?, and may be used to test each 
orthogonal regression coefficient by use of t or F. Information from 
these tests should be used to supplement that gained from the graph. 

You may decide that the linear component of the regression is pre- 
dominant, the remaining curved components being negligible. In this 
sense, it may be.said that you have ‘‘determined’’ (I’m a bit leery of — 
that word) that the regression is linear. 

To answer your last question, let us assume that you have found © 
the last three orthogonal coefficients to be negligible, and have adopted 
the quadratic to represent the regression. In this case, the mean 
square associated with the coefficient of the linear polynomial might 
be large or small depending on the shape of that portion of the parabola 
lying within the bounds of your data. Here the situation could arise 
that the mean square for the second stage of fitting be small and non- 
significant, yet be followed by a large mean square in the third stage. 


I hesitate to say that any of these tests are ‘‘coneclusive’’ because — 
I am not sure what you mean by the word. Any test involves a state- — 


ment about probability and in a sense such statements can never be 

conelusive. On the other hand, if the mathematical model describes 

the population, then the statements about probability are exact; if 

these statements lead you to a conclusion, then you might refer to the 

test as conclusive. However, rational considerations are the final eri- 
teria, not tests of significance. 

Grores W. SNEDECOR 


(52) 
QUERY: Upon reading the article by W. D. Baten and G. M. Trout 
entitled ‘‘A Critical Study of the Summation-of-difference-in-rank 
Method of Determining Proficiency in Judging Dairy Products’’ (Bio- 
metrics Bulletin, August 1946), I was impressed by the lack of sym- 
metry in the distribution of the rank correlation coefficient as given in 
Table 3. It would appear from the form of the function that it would be 
symmetric. Upon referring to ‘‘The Advanced Theory of Statisties’’ by 
M. G. Kendall, page 396, I found a distribution 3d? for values of n 


running from 1 to 8. A linear transformation, 1—- , upon the 


Sd? 
N(N?-1) 
distribution for n=7 when grouped as was done in the Biometrics 
article gives the following result: 
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Baten and Trout M. G. Kendall 


— 1.1000 to — 0.875 86 31 
— 0.625 266 319 
— 0.375 644 647 
- 0.125 976 975 
+ 0.125 1096 1096 
+ 0.375 985 975 
+ 0.625 638 647 
+ 0.875 321 319 
+1.000 28 31 


I should like to inquire if something else was intended by Baten 
and Trout. ; 

I am somewhat surprised to find that both very high and very low 
values are considered significant with respect to judging dairy prod- 
ucts. 


ANSWER: No, nothing different was intended. The published table 
should have been as pointed out. Thanks for calling my attention to 
it. Ido not know why it was published wrong. I take all the blame. 

W. D. Baten 


(53) 

QUERY: I have a 3x3 Latin square experiment in which, for the 
past 5 years, I have been comparing the effect of 8 kinds of manure on 
cultivated blueberries. I have proved the main point of this experi- 
ment, which is that manure is not sure death to cultivated blueberries, 
as was reported a good many years ago. The yields from the horse 
manure and cow manure plots are within a few quarts of each other 
so that it is obvious that there is no difference between these two. The 
yields from the poultry manure plots have been considerably below the 
other two. The question, then, is: Is poultry manure significantly 
inferior ? 

Since the blueberry is a perennial plant, how should the data be 
treated? Should the 5 years be added together and treated as a single 
year, or should they be kept separate? 

_ Since several of the bushes on 2 of the plots were infected with a 
virus disease, how can appropriate comparisons be made? 


ANSWER: In answer to your main question, the total yields for the 
5 years should be used in testing the significance of differences among 
treatments, the structure of the analysis of variance being, 
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Rows 2 
Columns 2 
Treatments 2 
Error 2 


You are doubtless aware that, with so few degrees of freedom for error, 
you can expect little information beyond the fact that you mention— 
the plants were not killed. From this experiment, you don’t even 
know whether the manures had any effect at all; that is, whether they 
changed yield from that of untreated plots or of plots fertilized with 
commercial fertilizer. 

“Ts poultry manure significantly inferior?’’ is a moot question 
which has been discussed before in this column (Vol. 1, page 26 and 
Vol. 2, page 16). From some research just completed by Dr. David 
B. Dunean, it seems that, in your particular case (two means at one © 
end of the range with one at the other), it is appropriate to apply the 
t-test to the difference between the largest and smallest means. 

In regard to the plants affected by disease, if it may be assumed ~ 
that the other plants in the same plots were unaffected either by the — 
disease or by lesser competition, it will be sufficiently accurate to use 
the average plot yields per plant in calculating the analysis of variance. 
‘With so few degrees of freedom for error, you are in a very vulnerable 
position ; hence, any great refinements in statistical methods are super- 
fluous. 

GrorcE W. SNEDECOR 


CORRECTION FOR QUERY (46), Biometrics, Vol. 3, No. 2, June, 
1947: Page 95, line 9, omitting the word ‘‘not’’, should read: 

“‘The method of estimating the missing value by minimizing SSE 
does give an unbiased estimate of o”.’’ 
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NEWS AND NOTES 


Our Biometrics Section chairman, D. B. DELURY, has left Virginia 
Polytechnic Institute at Blacksburg to return to Canada. Cooler with 
no duties?! He is now with the Ontario Research Foundation, 43 
Queens Park, Toronto, Canada. ... The Ontario Research Foundation 
was established by Provincial Act in 1928 and is an endowed institu- 
tion, half of its endowment being contributed by the Ontario Govern- 
ment and half by industry. A. E. R. WESTMAN, Director of Chemical 
Research, describes the work of the foundation briefly, ‘‘It carries 
on scientific research for industry, and to some extent for agriculture, 
forestry and fisheries in the fields of chemistry, metallurgy, biochem- 
istry, textiles, parasitology, climatology and soils. During the war 
we were asked to assist in the control of manufacture of tank shoes.’’ 
This work is described in an article by A. E. R. Westman and R. w. 
§. FREEMAN, ‘‘Statistical Control of the Manufacture of Manganese 
Steel Tank Shoes,’’ Canadian Metals and Metallurgical Industries, 
June 1945. Mr. Westman says that there has been an increasing de- 
mand for help with statistical methods both within the Foundation and 
from other institutions and government departments. Call upon D. B. 
DeLury. . . . Reconstruction has been in progress in the University 
of Cambridge, England, since the return of the statisticians, who were 
all away on war service, the fort meantime being held by J. 0. IRWIN 
who was in residence there. JOHN WISHART, reader in statistics, has 
resumed lecturing to mathematicians and to biologists. In the former 
sphere he is assisted by M. S. BARTLETT and H. E. DANIELS, lecturers in 
the Faculty of Mathematics. A Diploma in Mathematical Statistics 
has been established from 1 October 1947. This is a one-year course 
open to graduates with a reasonable standard in mathematics, who 
will study in the Statistical Laboratory (for which a temporary 
building will be constructed) the theory of statistics and its appli- 
cation to a selected one out of a wide number of fields ranging 
(alphabetically) from Agriculture to Psychology. In the biological 
sphere the University is appointing a Committee to consider what 
developments should be undertaken. In the Faculty of Economics 
and Politics, c. F. CARTER is lecturer in Statistics; and a Department 
of Applied Economies has started functioning under the Directorship 
of RICHARD STONE. Statistical problems in the Psychological Lab- 
oratory are looked after by E. @. CHAMBERS, and similar problems in 
relation to their particular sphere are beginning to engage the atten- 
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tion of the Department of Medicine. . . . RAE H. HARRIS, cereal tech- 
nologist, Agricultural Experiment Station, Fargo, North Dakota— 
welcome to our ranks! He states, ‘‘We are using statistical methods 
in evaluating our departmental data, but I still feel that it is a difficult 
problem for the research worker who is not a professional statistician 
to select the proper techniques for securing the largest amount of 
information from his data.’’ . . . It is possible we are seeking new 
subscribers to Biometrics. LEROY POWERS, Horticultural Field Station, 
Cheyenne, Wyoming, says, ‘‘I am sure that there is a real need for 
a journal of statistical methodology, and therefore, today am sending in 
my application for membership.’’ During the last few mouths several 
letters have been received commenting on the kind of articles you, the 
readers, want. These comments are helpful and an effort will be made 


to secure such articles. .. . HAROLD F. DORN, Federal Security Agency, 
United States Public Health Service, Washington, D. C., gave some 
excellent suggestions. . . . P. A. MINGES, Extension Specialist in Truck 


Crops, College of Agriculture, Davis, California, seems to be falling in 
line with the rest of the people in California. He likes the West! 
However, he does falter maybe—‘‘I do not believe that we raise quite | 
as good strawberries as we did in Iowa, but of course one reason is 
that I am not working on that crop. My work deals primarily with 
vegetables.’’ Let’s hope no difficulties are introduced by publishing 
disloyalty. Come see what can be done in the South. ... It has been 
observed that the acting editor published quotations which can have 
a variety of meanings when isolated from the text. Anyhow, it is some 
comfort to find that others who have visited in Honolulu under- 
stand. ... CHAUNCEY D. LEAKE, Vice President, The University of Texas, | 
Medical Branch, Galveston, enjoyed Hawaii a year ago in the spring. 
He writes, ‘‘We are very interested in the development of statistical 
analysis in our institution, and J. ALLEN scott, Professor of Preventive 
Medicine and Statistician for John Sealy Hospital, is in charge of this 
effort for us.’’ . .. JOHN C. ANDERSON, Assistant Research Specialist in 
Farm Crops, New Jersey Agricultural Experiment Station, New 
Brunswick, writes, ‘‘Professor Snedecor’s answers to various ques- — 
tions, I find quite enlightening.’”?...R8. E. PATTERSON is now 
Assistant Director, Texas Agricultural Experiment Station, College 
Station. Does that mean he will have added interest in statistical 
methods? . . . JOSEF BROZEK, Laboratory of Physiological Hygiene, 
University of Minnesota, Minneapolis, writes, ‘‘One of the functions — 
of Biometrics should be to spread biometrical methods to all fields in 
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which quantitative data are used. In physiology much attention has 
been paid to the methods by which data are secured, but the statistical 
evaluation is frequently neglected.’’ ... Ww. D. BATEN is now Chief, 
Operations Analysis Branch (A-5), Air Defense Command, Mitchell 
Field, New York. He was formerly with Michigan State College, Hast 
Lansing. ...H. J. miuuER, Assistant Professor of Plant Pathology, 
The Pennsylvania State College, State College, is the only person (as 
far as we know) who reads the editorials in Biometrics. He states, ‘‘I 
wish to say that I have found this publication to be very useful in its 
present form. For the average biological worker it affords an oppor- 
tunity to have questions answered by a competent statistician such as 
is not available in many of our institutions.’’ Now, we will see if he 
reads the NEWS AND NOTES. ... D. B. SHANK, who was Assistant 
Agronomist, University of Arkansas, Fayetteville, is now Associate 
Agronomist with the Agronomy Department, Agricultural Experiment 
Station, Brookings, South Dakota. His work consists of corn breeding 
and of corn variety testing. He finds different problems in South 
Dakota from those met in the South. For instance, ‘‘the short grow- 
ing season and the resulting necessity for early hybrids is something 
that I did not have to worry about before.’? But look how level the 
land is out there, more homogeneous soil! . . . P. V. SUKHATME, 
statistical adviser, Imperial Council of Agricultural Research, New 
Delhi, India, reports that, ‘‘During the last few years the Statistical 
Section of the Imperial Council of Agricultural Research has carried 
out sample surveys for estimating the yield per acre of rice in all the 
rice-growing districts of the United Provinces, Central Provinces, Bom- 
bay, Madras, Orissa and Bihar.’’ In the Annual Report (1946-1947) 
of the Social Science Research Center, University of Puerto Rico, Rio 
Piedras, of which CLARENCE SENIOR is Director, is found a report of a 
human biology study. HARRY SHAPIRO is directing a joint project 
with the Center, the American Museum of Natural History and the 
Department of Anthropology of Columbia University. The project is 
to examine the relationship between soil, water supply, food and the 
biological status of man, including the incidence of inherited defects. 
Anthropometric measurements will be made later in the study to check 
on nutritional inadequacies. . . . R. E ComsTOCK has returned to the 
Institute of Statistics as Professor of Genetic Statistics. For the last 
year and a half he has been at Agricultural Experiment Station, Rio 
Piedras, as Head of the Animal Industry Department. . . . M.S. BART- 
LETT has been appointed to a newly-established Professorship of Mathe- 
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ay 


teat aks 


matical Statistics at Manchester University. . . . The Statistical S 
mer Session at the Virginia Polytechnic Institute from August 5 to Sep’ 
tember 9 was attended by 166 persons from 33 states and 5 foreign 
countries. The list of courses included ‘‘Mathematics of the Desigi 
of Experiments’’ by R. c. BosE of the University of Calcutta, India 
“Design of Animal Experiments’’ by H. L. LucAS, University of Nort 
Carolina, ‘‘Statistical Methods’’ by a. w. SNEDECOoR of Iowa State Col) 
lege, ‘‘Mathematical Statistics’? by @. w. BRowN of Iowa State, anc 
“‘Rngineering Statistics’’ by BOYD HARSHBARGER of Virginia Polytechnic 
Institute. There were three courses in Sampling covering the fields off 
Interview Technique, Sampling -Methods, and the Mathematics 0: 

Sampling. They were given by Drs. METZNER and CANNELL of the 
University of Michigan and £. ©. HOUSEMAN and wW. A. HENDRICKS of thi 
B.A.E. Among the seminar speakers were: W. F. CALLANDAR, MORRIS) 
HANSEN, CHARLES F. SARLE, HAROLD HOTELLING, 5. 8. KEEPING of Canada, 
P, J. RULON, W. E. DEMING, G. W. SNEDECOR, G. W. BROWN, M. G. KENDALL, 
of England, r. c. Bosr of India, and GERTRUDE cox. 
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Officers of the American Statistical Association: President, Willard L. Thorp; 
Directors, Isador Lubin, Lowell J. Reed, Walter A. Shewhart, Samuel A. Stouffer, 
Helen M. Walker, Samuel 8. Wilks; Vice-Presidents, Chester I. Bliss, Philip M. 
Hauser, Stacy May, Jacob Marschak, Jerzy Neyman, Frank W. Notestein, George 
W. Snedecor, Aryness Joy Wickens; Secretary-Treasurer, Lester 8. Kellogg. 

Officers of the Biometrics Section: Chairman, D. B. DeLury; Secretary, H. 
W. Norton; Section Committee, Geoffrey Beall, E. J. DeBeer, D. B. DeLury, D. J. 
Finney, H. W. Norton and J. W. Tukey. | 

Editorial Committee for Biometrics: Chairman, Gertrude M. Cox; Mem- 
bers, R. L. Anderson, C. I. Bliss, W. G. Cochran, Churchill Eisenhart, H. W. Norton, 
G. W. Snedecor and CG. P. Winsor; collaborating editors, W. J. Dann, D. J. Fin- 
ney, G. E. Dickerson, H. O. Halverson, C. M. Mottley, J. G. Osborne. 

Material for Biometrics should be addressed to the Chairman of the 
Editorial Committee, Institute of Statistics, North Carolina State College, Raleigh, 
N. C.; and material for Queries should go to ‘‘Queries,’’ Statistical Laboratory, 
Iowa State College, Ames, Iowa, or to any member of the committee. 
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_ ON THE ESTIMATION OF BIOLOGICAL POPULATIONS 
D. B. DELURY 


Ontario Research Foundation 


Introduction. Attempts to give mathematical formulations of bio- 

logical systems are not uncommon. However, the results of these 
‘studies have not found their way into biological practice to the extent 
they deserve, possibly because insufficient attention has been given to 
‘such questions as the experimental verification of assumptions and the 
estimation of parameters. 

Tt is the writer’s belief that a fusion of a mathematical develop- 
ment, based on certain assumptions, with an experimental programme 
designed to test these assumptions aiid to provide estimates of whatever 
parameters are involved, will often yield results of the highest impor- 
tance. Some simple examples are given in this article which discusses 
information of the type yielded by the ‘‘creel census.’’ 


Statement of the Problem. Many studies of biological populations 
involve the capture of a considerable proportion of the individuals 
in these populations. Usually the records are or could be kept in such 
a way as to show, for each day (or other period of time), the number 
of individuals caught and also the amount of ‘‘effort’’ expended in 
capturing them. Records of this kind have been used to exhibit the 
“availability’’ (defined as the ratio of total catch to total effort) of 
fish in consecutive fishing seasons. It is clear, though, that a com- 
parison of availabilities cannot be translated into a comparison of 
population sizes without invoking some serious assumptions which can- 
not, as a rule, be tested. The purpose of this paper is to show that 
catch-effort records can be used to form estimates of absolute popula- 
tion sizes and to test certain important assumptions. 

It will be assumed, then, that catch and effort records are available 
for a series of consecutive time intervals. The catch for a given time 
interval, specified by t, will be denoted by c(¢) and the corresponding 
effort by e(t). The catch per unit effort for the time interval ¢ is then 
C(t) =c(t)/e(t). Now, as a population becomes depleted, the value 
of C(t) ordinarily decreases, and the amount by which C(t) is dimin- 
ished reflects the extent of the depletion which, in turn, depends on the 
total catch and the total effort. This qualitative statement may be 
made quantitative under suitable assumptions. This is attempted in 
the following examples. 


145 


Example 1. A Study of a Lobster Population. It may be ex-. 
pected that a population of lobsters provides one of the simplest sub-. 
jects for a population study, because migration is believed to be unim- 
portant, and recruitment through growth takes place only at times of 
moult. It seems not unreasonable, therefore, to make assumption (i). 

(i) The population is closed, that is, the effects of migration and 
natural mortality are negligible and the time interval is to be restricted 
so as to exclude times of moult. 

Assumption (i) alone is not strong enough to support an analysis 
of the catch-effort records, in that it supplies no information about the 
response of the population to the method of capture. Let d(t) rep- 
resent the proportion of the population captured during the time in- 
terval t. Some assumption about the nature of d(t) is required. 

(ii) d(t) =k(t) e(t), that is, the units of effort employed during 
interval ¢, e(¢) in number, do not compete with one another. k(t) is” 
evidently the proportion of the population captured during interval ¢ 
by one unit of effort. It is proposed that k(t) be called the catchabil- 
ity and e(t) the intensity of effort. 

(iii) k(t) =k, a constant. This assumption is, of course, seriously - 
open to doubt and must be tested carefully against observations before 
any conclusions are based on it. 

It is shown in Appendix I that, under these three assumptions, the 
following relations hold. 


(4)* log C(t) =log (kK N(o)) -k E(t); 
(5) C(t) =k N(0) -kK(t), 


where H(t) and K(t) stand for the total effort and total catch up to | 
interval ¢, and N(¢) is the number of individuals in the population at 
time t. Now values of C(t), H(t) and K(t) can be calculated from — 
the catch-effort records and plotted. If the graphs turn out to be 
acceptably straight, estimates of k and N(o) follow at once. If the 
‘graphs exhibit serious curvature, some modification of the assumptions 
is required. In the event that the assumptions are not contradicted by 
the data, the estimates of k and N(o) can be substituted in formula 
(2’) of Appendix I to exhibit the behaviour of the population through- 
out the whole period. 

Table I gives a day-by-day record of the catch of lohateea in 
pounds, and the effort, measured by the number of traps fished each 


* The numbering of these equations comes from Appendix I. 
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day. These records were made in 1944 in the Tignish area of Prince 
Edward Island. Other records, not presented here, show that the dis- 
tribution of size remains fairly constant in samples taken throughout 
the season. Thus ‘‘pounds’’ may be identified with ‘‘number of indi- 
viduals’’ without exposure to bias. 


TABLE I 
Column 1 2 3 4 5 6 7 

Date Pounds Traps C(t) x 100 K(t) In C(t) +1 E(t) 

May 2 147 200 74 0 87 0 

3 2796 3780 74 0 87 0 

4 6888 7174 96 3 98 4 

5 7723 8850 87 10 94 11 

8 5330 5793 92 18 96 20 

9 8839 9504 93 23 97 26 

10 6324 6655 95 32 .98 35 

11 3569 3685 97 38 .99 42 

12 8120 8202 99 42 1.00 46 

13 8084 8585 94 50 97 54 

15 8252 9105 91 58 96 62 

16 8411 9069 93 66 .97 72 

17 6757 7920 85 74 93 81 

18 1152 1215 95 81 .98 89 

20 1500 1471 102 82 1.00 90 

22 11945 11597 103 84 1.01 91 

23 6995 8470 82 96 91 103 

24 5851 7770 75 103 88 111 

25 3221 3430 94 109 97 119 

26 6345 7970 80 112 .90 122 

27 3035 4740 83 118 92 130 

29 6271 8144 89 121 95 135 

30 5567 7965 70 128 84 143 

31 3017 5198 58 133 -76 151 

June 1 4559 7115 64 136 81 156 

2 4721 8585 55 141 74 164 

5 3613 6935 52 145 12 « 172 

6 473 1060 45 149 .65 179 

7 928 2070 45 149 -65 180 

8 2784 5725 49 .150 .69 182 

9 2375 5235 45 153 -65 188 

10 2640 5480 48 156 .68 193 

12 3569 8300 43 158 63 199 

162 207 


It is convenient to define the unit of effort as 1000 traps fished for 
one day and the unit of catch as 1000 pounds. Column 4 of Table I 
shows the values of C(t), obtained by dividing the number of units 
in the daily catch by the corresponding number of units of effort. 
Column 5 contains the values of K(t), formed by accumulating the 
“numbers in Column 2. Note that K(t) is the total catch up to the ¢™ 
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day. Similarly the numbers in Column 7 give the accumulated effort. 
up to the ¢™ day. 

The values of In C(¢) are plotted against those of H(t) in Figure u 
Clearly these points cannot be considered to lie on a straight line. 
However, after May 22, the values of In C(t) decrease fairly regularly 
and it seems not unreasonable to fit a straight line to the records from 
May 23 to June 12. Therefore, for the present, the records up to May 
23 will be disregarded and totals of catch and effort calculated from 
this point on. The numbers are shown in Table II and the plotted 
points and fitted line in Figure 2. Details of the fitting are discoes aa 
in Appendix II. 
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The equation of the line is 
In C (t)’= 1.78529 — .0036256 E(t). 


Now equation (4) is written in terms of natural logarithms. The 
equivalent form using common logarithms is 
(4’) In C(t) =m [k N(0)|—klme E(t). 
Hence In [k N(o)] will be identified with 1.78529 and klne with | 
0036256. Direct computation then yields 


k = 0083482 
N (0) = 112.34 units of catch = 112340 pounds 


N(o) here represents the number of pounds in the population at the . 
end of May 22. 
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Substitution of these numbers in formula (2’) provides an equation 
which describes population throughout the range May 22 to June 12. 


N(¢) = 112,340 ee oeeenee at) 


where the H(t) values are those of Table II. In particular, putting 
i(t) =104, the number of pounds in the population at the end of June 
12 is calculated to be 47143 pounds. Thus, according to formula (2’), 
the population decreased, from May 22 to June 12, by 65197 pounds. 
The actual catch, during the same interval, was 65964 pounds. 
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The close agreement of these two numbers reflects chiefly the good- 
ness of fit of the straight line to the plotted points and gives no direct 
support to the estimate of N(o). However, nothing in the results of 
the computations is in conflict with the assumptions according to which 
these results are obtained and interpreted. It is true that other situa- 
tions than those assumed here could lead to a linear relationship be- 
tween log C(t) and H(t), but these would require a balance among 
several variables which are unlikely to admit such balance. It is to 
be supposed, therefore, that the straightness of the line gives consider- 
able support to the assumptions on which this computation is based. 
However, no decision on the adequacy of the assumptions can be made 
with confidence until these investigations are planned with a view to 
supplying the information needed for a critical test. Good tagging 
records, for example, or a breakdown of the catches according to size 
or age, will often indicate one set of assumptions in preference to 
others. 
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TABLE II 

Date C(t) x 100 K(t) In O(t) +1 E(t) 
May 23 82 0 91 0 
24 15 7 _ 88 8 

25 94 13 97 16 

26 80 16 90 19 

27 83 on 92 27 

29 89 25 95 32 

30 70 32 84 40 

31 58 37 16 48 

June 1 64 40 81 53 
2 55 45 74 61 

5 52 50 CE: 69 

6 45 53 65 76 

7 45 54 65 77 

8 49 55 .69 79 

9 45 57 65 85 

10 48 60 68 90 

12 43 62 63 96 


During the period from May 2 to May 22, the catch per unit of 
effort remains practically constant. It would seem that little can be 
gained by discussing these observations, because it is clear that catch- 
effort records in which the catch per unit effort is constant can supply 


no worth-while information about the population unless, on the basis — 


of other information, this constancy can be accounted for. No such 
information is available here. 


C(t) x 120 
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Instead of plotting log C(t) against H(t) the values of C(t) may 
be plotted against those of K(¢). A straight line fitted to these points 
is to be interpreted according to equation (5). When the values of 
C(t) and K(t) in Table I are plotted, a set of points much like those 
of Figure 1 is obtained. This graph is not shown here. Figure 3 
shows the values of C(t) and K(t) from Table II and the line fitted 
to them. This line has the equation 


C(t) = 0.92873 — .0079835 K (t) 
which yields the estimates 


_ &=.0079835, N(o) =116.33 units of catch=116330 pounds. These 
' numbers agree reasonably well with those based on equation (4). 


Equation (5) is somewhat simpler for numerical work than equa- 
tion (4), because logarithms are not needed. Also, equation (5) fur- 
nishes a graphical estimate of population size, given by the intercept 
of the line on the K(t) —axis. It might be concluded, therefore, that 
equation (4) should be dropped entirely in favour of equation (5). 
However, the final decision on this matter must await the development 
of an adequate sampling theory. 

Another point which indicates that both lines should be used arises 
from the fact that, in most cases, the assumptions on which these equa- 
tions are based will not be entirely fulfilled. ‘The inadequacy of the 
assumptions is likely to be reflected in different ways in the two equa- 
tions. It follows that substantial agreement between the two sets of 
estimates gives qualitative support to the assumptions. 

The data used in this example are not records for the whole Tig- 
nish area and therefore these population estimates do not refer to the 
whole area. It is, perhaps, worth remarking that if, in addition to the 
eatch-effort records used here, a record of’ total daily catch or total 
daily effort were available for the whole area, then equation (5) or 
equation (4) would be fitted to obtain an estimate of the total popula- 
tion, assuming, of course, that the values of C(t) computed from the 
sample and the catchability in the sample area are representative of the 
entire area. Under slightly stronger assumptions, the total catch or 


_ the total effort for the whole area could be used to expand the popula- 


tion estimate for the sample area to one for the whole area. 

No evidence is available to check the accuracy of the above esti- 
mates, but they agree in a general way with what is known about this 
_ lobster fishery. 
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Example 2. An Application to Some Speckled Trout Records. 
Figures 4, 5 and 6 show ln C(t) plotted against H(t) for the daily — 
eatch-effort records made on speckled trout in Montague Pond, Prince 
Edward Island, during the years 1944, 1945 and 1946. Catch is meas- 
ured by the number of fish caught each day and the corresponding effort 
is given in rod-hours. 

The three graphs are seen to be similar in some respects. Most 
striking is the more or less steady decline of In C(t) with E(t) down 
to a well-defined minimum, followed by a pronounced upward trend. 
The day-to-day fluctuation of C(t) is very large, which seems to imply 
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that the assumption of constant catchability is not satisfied. This 
fluctuation may, however, be regarded as exhibiting non-uniformity 
in the unit of effort. In any case, the assumption of constant catch- 
ability should provide a usable description of the behaviour of the 
population provided there are no substantial trends in the values of 
the catchability. “An attempt will therefore be made to account for 
the behaviour of C(t) according to the following assumptions. 


1. From the opening of the fishing season until the minimum was 
reached, recruitment was negligible and the catchability was 
constant. 

2. From the date on which the minimum was reached until the 
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end of the fishing season, the catchability was constant but the 
population was continually being built up by recruitment. 

The selection of the date on which the minimum is reached is arbi- 
trary, but in these examples this is not serious. The dates chosen are 
shown on the graphs. 

These assumptions permit the fitting of straight lines to the first 
portions of the graphs and the estimation of N(o0) and k, exactly as in 
Example 1. The results of the calculations are shown in Table III. 

The estimates for 1945 contrast sharply with those for 1944 and 
1946. The catchability in 1945 is about double those of the other 
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TABLE III 
Estimates Estimates Calculated Gatch 
(equation 4) (equation 5) decrease 
N (0) k N(0) k 

1944 

May 2-Sept. 5 5761 6.874 x 10-4 5801 7.256 x 10-4 3569 3471 
1945 

Apr.17-June 24 3025 1.313x10-3 3266 1.214 10-3 2183 2185 
1946 

Apr.19-Aug.6 6014 6.531x10-4 6287 6.207 x10 3375 3087 


years and the population estimate is roughly one-half those of 1944 
and 1946. These discrepancies can be accounted for by the assump- 
tion that about half of the catch and the corresponding effort were not 
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reported during 1945. This question cannot be settled here, although 
probably it could be if tagging records were available, but it raises 
an interesting possibility for the use of these methods—the detection 
of poaching! Natural mortality, of course, produces the same effects 
as unreported catch. 
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Turning now to the portions of the graphs between the minimum 
and the end of the season, neither the assumptions nor the plotted 
points indicate any specific procedure as being appropriate in trying 
to account for the behaviour of C(t). The assumptions are not sharp 


enough to require that these points follow any particular curve and ~ 


the plotted points are so widely spread out that they do not indicate 


a curve of any definite form. However, the points show clear evidence | 


i 


of trend (except in 1944, when they are too few in number to show ~ 


anything) and since there is nothing to indicate a curvilinear relation- 
ship, a straight line has been fitted to exhibit the trend. The 1946 
records only are used in this illustration. 

The equation of the straight line fitted to the 1946 records from 
August 6 to September 14 is 


In C(t) = 0.32022 + 0.00018159 E(t). 


where H(t) is measured from the end of August 6. Assuming that the 
catchability has the same constant value as in the earlier part of the 
season, it follows from the development of Appendix I [formula (7) ] 
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that the population varies according to the equation 
N(t) = 8200.5 @°90018159 B(t) 


According to this equation, the population increased from 3200 at the 
end of August 6 [equation (4) estimates this number at 2639] to 3724 
at the end of September 14. The catch calculated from this equation 
[Appendix I, formula (8)] is 817, whereas the actual catch is 711. 
' The calculated recruitment is therefore 1340. The rate of recruitment 
is calculated to be about 1.2 per cent. 

This treatment of the second portion of the fishing season is most 
unsatisfactory, based as it is on a set of speculations which cannot be 
tested with the available data, and is given here only to show one of 
the several types of computation which might be employed in such 
cases. 

These data do not, in themselves, give any assurance that the up- 
ward trend in the log C(t) —EH(t) curve is the result of recruitment 
rather than, for example, an increase in the catchability, although 
general experience does support this assumption. This point could be 
settled if the population had contained enough tagged individuals to 
show up well in the catches at the time when the upward trend began. 
Granted that recruitment does take place, it may be important to know 
if it is supplied by immigration or through growth. Presumably some 
information on this question would be obtained by breaking down the 
catches according to size. Indeed, with these two additions to the 
catch-effort data, many sources of perplexity would be removed. 

The one possibility of an independent check on these population 
estimates lies in some tagging records which were taken along with the 
1944 catch-effort data. On May 15, 1944, 178 marked trout were re- 
leased in the pond and all recaptures were recorded. The recapture 
records for each day have been used to estimate the size of the popu- 
lation on that day. The results of these calculations are exhibited in 
Figure 7. The solid line represents the size of the population com- 
puted from the population equation. 

The population estimates based on the recapture of marked fish have. 
very large sampling errors, since each one is based on a‘small number 
of recaptures. This is reflected in the large dispersion of the plotted 
points, but even so, certain facts emerge clearly. The population esti- 
mates based on recaptures of marked fish show a definite downward 
trend as the season progresses and these estimates are persistently 
lower than those given by the population equation. 
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There is no reason to regard the estimates obtained from the marked 
fish as more trustworthy than those calculated from the population 
equation. Indeed, most of them are absurdly low, in view of the num- 
ber of fish taken from the pond during this period. In any event, it is 
desirable to account for the discrepancy between the estimates given by 
the two methods. 

The bias usually feared in tagging experiments arises from the pos- 
sibility that a number of the marked individuals die as a result of the 
marking. It is not known whether or not any of these marked fish 
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died, but in any case, this would result in estimates that are too large. 
A bias of another sort is needed to account for estimates that are per- 
sistently low. 

If, for any reason, the marked fish are more likely to be caught 
than are the unmarked fish, population estimates based on their recap- 
ture will be too low. The assumption of such an effect may seem to 
be unwarranted, but in the case under consideration, something of this 
kind may well have happened. The fish to be marked were caught 
by the same method as was used to capture them later on and if some 
of the fish were more ‘‘catchable’’ than others, the group of marked 
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ih presumably contained a larger proportion of ‘‘highly ecatchable’’ 
than did the population as a whole. 
With this possibility in mind, it is of interest to apply to the sub- 
population of 178 marked fish the same methods of estimation as were 
sed for the whole population, even though it is not to be expected that 
the methods would perform well because of the smallness of all the 
numbers involved. Ona number of days, no marked fish were caught, 
which makes it necessary to group together sets of consecutive records 
so as to avoid zero values of C(t). 

The following estimates are obtained by fitting equations (4) and 

(5) to these grouped data. 


Equation (4) : N(0) =187, k = 1.0406 x 10-3 
Equation (5): NV (0) = 145, k = 1.1341 x 10° 


It is particularly interesting to find that the catchability of marked 
fish is almost double that of the population as a whole. Assuming that 
this is not accidental, it may well be simply a size effect and might 
disappear if these computations were performed for fish of a given size. 


Summary and Conclusions. An outline is given of a procedure for 
obtaining, from catch-effort records, estimates of the population size 
and certain important parameters. The method depends on the selec- 
tion of appropriate assumptions to account for the observed behaviour 
of the population. The choice of assumptions may often be difficult to 
make, inasmuch as several different assumptions may provide reason- 
able explanations of the data. It has been suggested in this article 
that population investigations should be so designed as to furnish infor- 
mation which will distinguish among the effects of the several plausible 
assumptions. 

Probably biologists will contend that the assumptions made in the 
examples of this article are too simple and cannot hold even approxi- 
mately in the populations which have been discussed. This may well 
be the case. For example, catchability undoubtedly varies with size 
and, in some instances, with other factors as well. Thus, the constant 
catchability used in these examples must be regarded as an average 
over all sizes which appear in the catch. This average will be useful 
only if the distribution of size in the population remains constant, a 
condition that usually will not be met. This particular difficulty can 
be overcome by applying the methods to restricted size ranges, which 


157 


requires that the catches be recorded according to size. This devi 
might well lead to a picture of the distribution of size in the popula 
tion that is fairly free from bias, which cannot be accomplished (i 
the case of fishes, at least) by direct sampling. It is quite possible, 
therefore, that the proper use of stratification in an investigation willl 
produce data to which simple assumptions may safely be applied. E 

The assumption of constant catchability is not essential to the appli- 
cation of these methods, although, of course, the treatment is simpler’ 
when it can be made. Methods of this kind can be extended and. 
elaborated indefinitely, but are likely to have little value unless the 
designs of the experimental investigations are correspondingly refined. 
It is obvious, too, that the successful application of these methods re- 
quires a thorough understanding of the biological factors which in- 
fiuence the behaviour of the subjects of the study. 

It should be worth while, during the course of a population inves: 
tigation, to plot C(t) against K(t) and log C(t) against H(t) as the 
records are taken, because these points will give immediate indicatio 
of a change in the behaviour of the population and steps can be taken 
at once to investigate the causes and character of the change. : 
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APPENDIX I 


Notation and a General Equation. 

N(t) represents the number of individuals in the population a 
time ft. 

r(t) denotes the relative rate at which individuals are added 
the population at time ¢. Thus, the number of individua 
added to the population during the interval (¢, ¢+dt) is 
given to the first order by 

N(t) r(t) dt. 
The function r(t) is intended to cover such sources of varia- 
tion as migration, growth, ‘‘natural’’ mortality, ete. and may 
take both positive and negative values. 

d(t) represents the relative rate at which individuals are removed 
from the population by the sampling method. 
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In terms of these symbols, the population increases, during the 
‘interval (t, ¢+ dt), by : 
N(t) [r(t)-d(t)] dt 


‘individuals. This may be equated to 
N(t+dt) -N(t). 


‘Dividing by dt and passing to the limit as dt approaches zero, this 
equality yields the relation 


EVO) w(t) [r() -a()], or 
(1) log N(t) =r(t) -d(t). 


All the equations used in this paper are derived from special cases of 
equation (1). 


Some Special Cases. The developments of example 1 and part of ex- 
ample 2 rest on the following assumptions. 


(i) r(t) =0; 


(ii) d(t) =k(t) e(t), where e(t) represents the rate of expendi- 
ture of effort at time t; 


(iii) &(t) =k, a constant. 


Equation (1) becomes, under these three assumptions, 


(V’) © log W(t) =—ke(t), 


which ean be integrated to give 
(2) log N(t) -log N(0) =-k fe(2) dt --k E(t), 
oO 


where H(t) stands for the total effort expended during the inter- 
val (0, t). The time origin may, of course, be put in any con- 
venient place. Equation (2) may be written more compactly in 


the form 
(2) N(t) =—kN(0)e* 
Now, from either of the equations (1’) and (2’), it follows that 
dN eI —k H(t) 
(3) GE 7 ~ PN (oe ; 
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(4) 


(5) 


The derivative oa which represents the rate of increase of the 


population per unit of effort expended, can be identified with the — 
negative of the catch per unit effort, in view of assumption (i). © 
Using C(t) to denote the catch per unit effort and writing — C (¢) 


for aw equation (3) becomes on taking logarithms, 


log C(t) =log [kK N(0)]-k E(t). 


4 
Under assumption (i), the total catch during the interval 
(0, t), denoted K(t), is given by 


N(o) —N(t) =N(0)-N(o) e* ® , by equation (2’) 
Also, equation (3) under assumption (i) states that 
C(t) =kN (0) e* 3 


Elimination of H(t) between these two equations yields the rela- 
tion 


Thus, under assumptions (i), (ii), and (iii), equation (4) holds and 


(5) is easily shown to imply equation (4). Therefore, provided the 
three assumptions are strictly fulfilled, equations (4) and (5) are 
wholly equivalent. 


(6) 


(7) 


| 
C(t) =k N(0) -k K(t) 
implies equation (5). Likewise, under the same assumptions, equation 


It may be noted here that the relation 

C(t) =k(t) N(t) 
holds under only the rather mild assumption (ii). Equation (5) 
is simply one form of equation (6) when assumptions (i) and (iii) 
also apply. 

When assumption (i) is dropped, as it was in the attempt to 
account for the behaviour of C(t) during the later parts of the 
seasons in example 2, equations (1’) to (5) no longer apply. If 
a linear equation, log C (t)=a+b E(t) say, does describe ade- 
quately the relation between log C(t) and H(t), it follows from 
equation (6) that 


k(t) N(t) = eth A(t) 


In equation (7), values of a@ and 6 are determined by the data, 
but the function k(t) is unknown. A value was assigned to k(t) 
in example 2 by assuming, not only that k(t) was constant, but 


160 


that its value was the same during the later part of the season 
as it had been during the earlier part. No particular justification 
can be offered for these assumptions, but this is the sort of ques- 
tion that can be provided for in the design of an investigation. 

After a constant value has been assigned to k(t), equation (7) 
furnishes an explicit formula for N(¢) : 


(7’) W(t) =F ora, 


If this formula gives an adequate description of the population, 
several computations are of interest. 

The total catch during the period in which (7’) applies is 
given by the following integral, evaluated over the whole of this 


period. 

(8) fe N(t) e(t) dt = -e H(t), 
where E (t) here represents the total effort applied during this 
period. \ 


The rate of recruitment can be calculated as follows. Equa- 
tion (1) becomes, under the assumptions in use here, 


d 


(9) Slog N(t) =r(t) —ke(t), 
and from equation (7’), 
(10) log N(t) +log k=a+b E(t). 


Differentiating with respect to ¢ in equation (10) and using this 


result to eliminate £ log N(t) from (9), it is found that 


(11) r(t) = (b+k) e(t). 
It is of course, unreasonable to express the rate of recruitment as 
a function of the intensity of fishing. However, e(¢), in spite of 
fluctuations from day to day, contributes to the equations in a 
virtually constant manner, and therefore r(t) is effectively con- 
stant. 
The total recruitment for this period is given by 


(12) ; fro N(t)dt=(6+k) fre e(t)dt 
= (1 + z) times the total catch. 
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Clearly these computations can have little value in the present ex 
amples, because they are based upon inadequate information. It is to 
be expected, however, that when the requisite information is available, 
calculations of this nature may have some importance. 


APPENDIX II 


The Fitting of the Straight Lines in Examples 1 and 2. The symbol 
C(t), as used in Appendix I, represents an instantaneous rate, the rate 
of capture of each unit of effort at the instant t. Values of C(t) have 
been approximated from the data by means of ratios of differences, 
that is, (daily catch) /(daily effort), which may be regarded as daily 
averages of the values of C(t). The values of K(t) and H(t) which 
correspond to these average values of C'(t) should presumably be calcu- 
lated up to points located somewhere between the beginnings and the 
ends of the days, but in practice, this is impossible. The catch and the 
effort corresponding to the value of C(t) for a given day must be 
accumulated wp to that day, or up to and including that day. The 
first of these alternatives has been used in this paper. 

The choice of an appropriate method of fitting equations (4) and 
(5) cannot be made satisfactorily until a statistical model is con- 
structed to give an adequate description of the behaviour of the popu- 
lation. Lacking such a model, equations (4) and (5) have been fitted 
as if they were regression equations, with H(t) and K(t) treated as 
independent variables and In C(t) and C(t) as dependent variables. 
This procedure appears to be not unreasonable, except that the question 
of weighting deserves attention. Some discussion of this point is given 
in Appendix III. 

APPENDIX III 


A Simple Statistical Model. A population of N individuals may be 
represented by a set of N identical white beads in a box. A unit of 
effort may be introduced as a dip with a small scoop, which removes a 
number of the beads. These beads are counted and replaced by an 
equal number of red beads. The red and white beads are thoroughly 
mixed in the box and another scoopful of beads is removed. Any red 
beads taken by the scoop are returned to the box; the white beads are 
counted and replaced by an equal number of red beads. This process 
is continued. 

The red beads are not considered to belong to the population, but 
are used only as ‘‘fillers’’, to ensure that a unit of effort always takes 
the same proportion of the population. The red beads could, of course, 
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ye regarded as tagged members of the population and population esti- 
mates could be based on the proportions of red beads in the samples. 

Two sources of ‘‘error’’ may be expected in this sampling scheme. 

(i) The proportion of white beads in the sample will usually be dif- 
ferent from the proportion of white beads in the population. Dis- 
srepancies arising from this source will be called ‘‘sampling errors’’. 

(ii) The total number of red and white beads taken will-vary from 
mne sample to another. This may be regarded either as a variation in 
eatchability or as a variation in the unit of effort. Indeed, these two 
statements are identical. The two sources of variation (i) and (ii) 
will be considered separately, even though they are not wholly inde- 
pendent. 


(i) Assume for the present that each dip with the scoop takes out n 
red and white beads, so that the catchability is strictly constant. Let 
the proportion of white beads in the box just before the ¢” sample is 
taken be p;. Assume further that the ratio of to N is so small that 
the distribution of white beads in samples of n may be closely approxi- 
mated by a binomial distribution. Then, if C(t) is the number of 
white beads taken in the ¢* sample, the mean value and variance of 
C(t) are 


GC (t) =n pt, Var C(t) =" pe(1— pr). 


If K(t) white beads have been removed in the first ({-1) samples, 
then p;=1-—K(t)/N and the mean value of C(t), conditional upon the 
total number of white beads taken in the first ({-1) samples, is 


GC (t) | K(t)] =" - = K(t). 
Evidently, n/N is the catchability, k, and therefore 
(5’) & (C(t) | K(t)] =kN-k K(t). 


This is the form taken by equation (5) for this statistical model. 

Since no question of the sampling behaviour of K(t) enters into 
equation (5’), it follows from least squares theory that the linear esti- 
mates of the parameters (KN) and k which have minimum variance 
are those which minimize 


Sow [C(t) -kN +k K(t)]?, 
t 


where the weighting factors w(t) are inversely proportional to 
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N N 


Thus the estimation of (kN) and & involves only the fitting of a 
weighted regression of C(t) on K(t). 

The relation of C(t) to H(t), analogous to equation (4), seems 
be less obvious. It is clear, though, that such a relation must have 
somewhat different character from that of C(t) to K(t), because th 
sampling behaviour of C(t), given K(t), depends only on the valu 
of K(t), whereas that of C(t), given H(t), must involve the whol 
stochastic process up to the “sample. For this reason, it is necessa 
to compute the unconditional mean of C(t). The following relatio 
yield this unconditional mean. 

The conditional distribution of C(t) given K (t) is specified by 


P[C(t) =| K(t)] - (*)[1-20F Bei “2 


! 
where () eyed ee eee Therefore, 
z/ x1 (n-=x)! 


P[C(t+1) =2/K@)T= DS Piew+1) = 2, C(t) =a| K(t)] 


A=0 


POM PTET OPT 


Using this relation, 


n 


[O(¢+1) | K(t)] = es x P[O(t+1) =| K()] 


z=0 


is calculated, by direct evaluation of this sum, to be 
n(1-5)] = |- (1-k) [kN -kK(t)] = (1-k) G [O(t) | K(4)] 


A continuation of this type of calculation yields the relation 


E(C(t+y) | K(t)]=(-k)” = E[C(t) | KI], 
which holds for ¢ =1, 2, 3,....and y=0, 1, 2, 3,.... 


When ¢=1, K(t) =0, & [C(1) | K(1)] =n=kN 
and & [C(y+1) | K(1)] =kn(1-k) 
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is the unconditional mean of C(y+1). Therefore, 
G C(t) =kN(1-k)** =kN(1-k)™, 
since H(t) =¢—1 in this case. Taking logarithms, 
log & C(t) =log (kN) + H(t) log (1—k). 


This is the equivalent of equation (4). log (1—k)' may be replaced 
by — /& without sensible error, since & has been assumed to be very small. 
The fitting of this equation to observations falls outside the scope 
of regression theory. The line of best fit, in the least squares sense, 
is obtained by the same calculation as would be used in fitting the 
weighted regression of log C(¢) on H(t), since H(t) is known without 
error, but questions involving the estimation of parameters are com- 
plicated. The weights to be used are inversely proportional to 


K(t)/[N-K(t)] 


(ii) When the catchability varies from one sample to another, as 
would be the case in this model when the total number of beads taken 
by the scoop varies from sample to sample, two cases may be considered. 

In sampling from the population of beads, the value of n, the total 
number of beads, could be recorded for each sample and exact allow- 
ance could be made for variations in the values of n. No new question 
is involved in this case and nothing of interest can be found, because 
in practice nothing equivalent to the value of n can be observed. 

When values of » or their equivalents cannot be recorded, it seems 
not unreasonable to treat m as a random variable, with mean 7, and 
variance o,”. Then the value of C(t), given K(t), can be expressed in 
the form 


C(t) = (10481) ( A) + an 


where 6; and A; are random variables with zero means. These variables 
are not independent, but their covariance is zero. Their variances are 


Tn? and Np (1-47) ae Hence, 
Var [C(t) | K(¢)] = (2- A) oat + no(1- 202) 
= On (1-20?) (+mo-ont) (1-202) =e), 
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The weighting factors, therefore, are proportional to 
(13) is [(2 -20) (1+ zy Et Jet— 1. 


The value of o,? or its equivalent would not ordinarily be known 
and the values of could conceivably range from —1 to +0. When 
\=—-1, the variation in catchability is so large as to completely over- 
shadow the sampling errors; \ =+ 00 corresponds to case (i) discussed 
above and would not occur in practice. It may be guessed that, as a 
rule, X will lie between —1 and +1. 2X takes the value +1 if follows 
a symmetrical binomial distribution and vanishes if n follows a Poisson 
distribution. Furthermore, the dependence of the weighting factors on 
the value of A becomes important only after the ratio K(t)/N reaches 
a sizable magnitude. Therefore in many, perhaps most, cases, weights 
proportional to 


(14) (1 2 =r) \ 


will serve reasonably well. 
The corresponding weights for dtiine log C(t) are given by 
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The value of N to use in the weighting formula would not be known 
in practice, but its value could be approximated by an unweighted fit- 
ting or graphically by a line fitted freehand to the (C(t), K(¢)) points. 
Further accuracy could be reached by using the value of NV, obtained 
from the weighted fitting, to compute adjusted weights, which would 
then be used for a second weighted fitting. This iterative process can 
be continued, but such attempts at precision would not generally be 
required. Indeed, estimates based on an unweighted fitting should be 
good enough for many purposes. 

For this statistical model, estimates of N and k& based on equation 
(5’) can be supplied with standard errors by ordinary regression 
theory. If the fitted equation is written C(t) =a+b K(t) and if ey, 
C12, Co2 denote the elements of the matrix inverse to that of the normal 
equations which determine a and b, then the estimated variances of 
a and 6 are given by s?c,, and s?¢z2, where 7 s® is the residual sum of 
squares and +2 is the number of pairs of observations to which the 
equation is fitted. Their estimated covariance is s?Cj>.. 
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The estimates of k and N areb anda/b. The standard error of the 


estimate of k is therefore s\/Co2 and a first order approximation to that 
of WN is given by 


s DEES) s | aT 
B VV 611? — 2¢120d + Co20? = 5 Vou +2¢,2N + CooN?. 


If normal theory is assumed to apply here, fiducial limits may be 
attached to & in the ordinary way, that is, by calculating 


b + ts \/e22, 


where ¢ is taken at the desired confidence level with n degrees of free- 
dom. Fiducial limits for NV are given by the roots of the quadratic 
equation 

N? (b? — #7572) + 2N (ab — t?s?¢12) + (a? — t75?c1,) =0 
provided b? — #?s?¢22 > 0. 

Even though the distribution of C(t) may depart somewhat from 
normality, these fiducial limits should be useful. 

In sampling from this statistical model, each sample is based on 
one unit of effort, a restriction which is likely to be seriously violated 
in practice. The disturbance thus introduced into the weights could 
be remoyed, either by grouping together sets of consecutive records into 
groups containing approximately equal amounts of effort, or by re- 
placing the weighting factors w(t) by e(t) w(t). 

The conclusions reached for this statistical model may be summed 


up as follows: estimates of N and k should be calculated from equation 


(5), fitted using suitable weights, rather than from equation (4). 
Standard errors of these estimates may then be calculated. 

Doubtless this model ignores many factors which are important 
in natural populations. Much work, both theoretical and experi- 
mental, is needed to provide acceptable models, not only for the meth- 
ods suggested here, but for all investigations into biological populations. 
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PROCEEDINGS OF THE FIRST INTERNATIONAL 
BIOMETRIC CONFERENCE 


Marine Biological Laboratory, Woods Hole, Mass. 
Friday Morning, September 5, 1947 


As Chairman of the Organizing Committee, Dr. C. I. Bliss opened 
the meeting and called on Dr. Charles Packard, Director of the Marine 
Biological Laboratory, who spoke as follows: 


WELCOME TO THE MARINE BIOLOGICAL LABORATORY 
| CuarLes Packarp, Director 


It is a pleasure to welcome, on behalf of the Marine Biological Labo- 
ratory, the First International Biometric Conference and to greet old 
friends and new who are attending it. We are glad to have meetings 
of this kind here in Woods Hole for they bring scientists whose primary 
interest in the general field of Biology are somewhat different from our 
own. It may be that your presence here will stimulate our investiga- 
tors to apply those biometric principles which will aid them in formu- 
lating and interpreting their results. 

When this Laboratory was founded 60 years ago the Director, Dr. 
Whitman, looked forward to the time when all kinds of biological re- 
search should be carried on here; not only morphology and embryology 
which were then the chief topics, but also less developed fields including 
Bacteriology and Plant and Animal Breeding. Were he alive today 
he would certainly put Biometrics high in the list of desirable studies, 
for much of the work here now is quantitative in nature, 

This Laboratory has grown from small beginnings in a little 
wooden building where a few investigators and students gathered each 
summer, until now it can accommodate 500 investigators, assistants, 
and students. With this growth has come specialization. We have a 
great library, probably the most useful and the most used library to be 
found anywhere. Our Apparatus and Chemical Departments provide 
our workers with all the usual tools of research; the Supply Depart- 
ment brings in quantities of material every day for the investigators 
and students ; we have also a machine shop, a glass blowing shop, and a 
microfilm service. All of these departments are open for your in- 
spection. 

This is a democratic community where students and young inves- 
tigators feel free to ask advice from more experienced workers, and to 
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meet them informally at the Mess and on the bathing beach. This is 
one of the delightful aspects of life here, and is a reason why our people 
return year after year. We hope that the members of this Conference 
also will frequently come back to Woods Hole. 


* * * % * 


Dr. Bliss then reported that he had asked Doctors A. Linder, Carl 
F. Kossack and H. W. Lindstrom to serve as a committee to nominate 
permanent officers of the Conference. This committee proposed the 
following nominees: Chairman, Georges Teissier; Co-Chairman, R. A. 
Fisher ; Secretary, C.I. Bliss. Dr. N. Rashevsky seconded the nomina- 
tions and the above officers were elected. 

In the absence of Dr. Teissier, Professor Fisher took the chair and 
called upon Dr. Bliss for the report of the Organizing Committee. 


REPORT OF THE ORGANIZING COMMITTEE 


The first step leading to the present Conference was the appoint- 
ment on April 8 of a committee of four by D. B. DeLury, Chairman of 
the Biometrics Section of the American Statistical Association. Dr. 
DeLury noted that the International Statistical Conferences in Wash- 
ington made no specific provision for biometry. Since statisticians 
interested in biological statistics would be in America for the Wash- 
ington meetings, he asked the committee to arrange a conference at 
which international cooperation in biometry could be discussed and 
placed on an effective basis. In accepting this assignment, the Organ- 
izing Committee added to its number so as to be broadly representative 
of quantitative biology in its statistical and mathematical aspects. 
Two representatives were named by the National Research Council. 

Financial support was provided by the Rockefeller Foundation 
which contributed $1,000 to meet the travelling expenses of delegates 
from overseas between New York and Woods Hole and other costs. 
The Marine Biological Laboratory agreed to receive these funds and 
disburse them on order of the Organizing Committee. Any surplus 
reverts to the Rockefeller Foundation. 

In preparing the list of foreign invitees to the Conference, we had 
the active cooperation of the Joint Arrangements Committee for the 
International Statistical Conferences in Washington. Invitations were 
sent to those planning to attend the sessions in Washington, who we 
believed might be interested in this Conference. To this list we added 
the names of other prominent foreign biometricians, who were then 
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invited to the Washington meetings by the Joint Arrangements Com- 
mittee. Invitations to the Biometric Conference were sent to 52 scien- 
tists representing 18 different countries. The Organizing Committee 
also sent individual letters of invitation to 157 American and Cana- 
dian scientists and announced the Conference in Science and elsewhere. 

A major objective of the present Conference is to lay the founda- 
tion for future cooperation in biometry. International cooperation 
in other sciences has taken various forms, ranging from triennial con- 
eresses arranged by small committees appointed by the preceding con- 
eress to formal organizations with individual scientists as members. 
Dr. Stuart Mudd, of the International Union of Biological Sciences, 
has urged the advantages of formal organization and recognition by 
UNESCO, which would make us eligible for financial aid in respect to 
future meetings. A letter from Dr. Joseph Needham, Head of the 
Division of Natural Sciences in UNESCO, expressed regret that he 
could not attend personally, wished the Conference every success and 
hoped it would be the first of many. He has also asked whether we 
““intend to set up an international committee or society of a perma- 
nent nature to concern itself with this subject (biometry) and these 
conferences.’’ A letter from Dr. Stuart Rice notes that the proposed 
reorganization of the International Statistical Institute provides for 
affiliation by international organizations in specialized fields of appli- 
cation. Saturday morning has been set aside for discussing this prob- 
lem and for forming some continuing international biometric organi- 
zation if that is the wish of the Conference. 

The Organizing Committee is especially indebted to our host, the 
Marine Biological Laboratory. The arrangements which have been 
made with its cooperation are indicated on the printed program, to- 
gether with a brief sketch of the biological institutions in Woods Hole 
which make it a major center of biological activity in the United States, 
and a logical choice for the present Conference. 

Respectfully submitted, 


The Organizing Committee 
C. I. Bliss, (Chairman), Connec- E. J. deBeer, The Wellcome Re- 


ticut Agricultural Experiment search Laboratories 

Station H. K. Hartline, University of 
A. F. Blakeslee, Smith College Pennsylvania School of Medi- 
W. G. Cochran, North Carolina cine 

State College H. W. Norton, Oak Ridge, Tenn. | 
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John von Neumann, Institute for EH. W. Sinnott, Yale University 


Advanced Study J. W. Tukey, Princeton Univer- 
N. Rashevsky, University of Chi- sity 
cago E. B. Wilson, Harvard University 
* * * * * 


The report was accepted. Dr. Bliss moved that the Chairman name a 
committee to consider what type of international organization would 
be most suitable for biometry at the present time, to submit for con- 
sideration a draft constitution embodying its recommendations and 
to report to the Conference at the morning session on September 6. 
The motion was carried. 

The meeting was then turned over to the chairman of the scientific 
session, Dr. A. F. Blakeslee, who called on Professor Fisher for his 
paper on ‘‘A Quantitative Theory of Genetic Recombination.’’* 


Friday afternoon, September 5 


The Chairman, R. A. Fisher, proposed the following committee 
to consider the form of international cooperation: Doctors Belz, Bliss, 
Bose, Bronk, Dieulefait, Fisher, Hopkins, Linder, Neurdenburg, 
Rasch, Teissier and Tukey. This proposal was adopted. A Commit- 
tee on Resolutions was named consisting of D. G. Catcheside, Jacques 
Monod and E. B. Wilson. 

Professor Wilson then took the chair and called on several speak- 
ers to report on recent biometric developments overseas. 


* This paper, and the ensuing discussion, will be published in the March 1948 issue 
of Biometrics. 
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RECENT BIOMETRIC DEVELOPMENTS IN DENMARK 
G. Rascu 


In reporting informally on recent developments of biometry in 
Denmark, I shall review briefly some of the problems which we have 
studied. Because they are more familiar to me, I shall deal especially 
with the investigations of my colleagues and myself. The extent to 
which our results are new or merely corroborative of similar studies 
in other countries is a question I shall leave to others. The first three 
sections deal primarily with biomathematical problems and the re- 
mainder with statistical aspects of biometry. 


BIOMATHEMATICS 


Descriptive or empirical relations. Th. Madsen, partly in collabo- 
ration with 8. Arrhenius, has described several immunological proc- 
esses in terms resembling mono- and bimolecular chemical reactions. 
This work was continued by Jensen [5] and in 1941 Ipsen [2] pub- 
lished a comprehensive study of the haemolysis of red blood cells and 
the length of survival of mice, rabbits and other experimental animals 
following different concentrations of several toxins. He was able to 
reduce both types of response to an equation of the form 


(ea) gate: 


where D is the dose producing a 50 per cent response, 7 the corre- 
sponding reaction time, d and ¢ are the asymptotes for dose and time, 
a and K are constants. 

In the field of growth the publication of Wetzel’s control chart for 
the growth of children [13] led me to analyze the growth in height and 
weight of 150 Danish school children. When plotted against each 
other, the logarithms of height and weight for each individual at suc- 
cessive age intervals followed a straight line very closely. The lines 
varied considerably from one child to another, both in position and in 
slope. The slopes in particular varied much more than was appar- 
ently the case for the American children on whom Wetzel based his 
growth channels [11]. 

In analyzing individual growth curves more generally, I have 
found a relation which was applicable to data on bacteria, mice, rats, 
calves and babies. The logarithm of weight can be plotted as a 
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straight line against the age measured in a metric characteristic of each 
species. This metric or ‘‘physiological age’’ is determined empirically. 
While varying considerably between species, it is applicable to normal 
individuals within a species [12]. 

Deductwe. An interesting theory on contact epidemics has been 
developed by Petersen [9]. From general assumptions on the fre- 
quency of chance contacts of an infected with an uninfected person, he 
derived a differential equation for the spreading of an epidemic. On 
integration this led to the differentiated logistic curve, which of course 
could be discriminated from other similar curves only in very large 
epidemics. But his underlying assumptions specified the dependence 
of the duration of an epidemic on the number of cases and on the size 
of the population. From the data of various epidemics (scarlatina, in- 
fluenza, ete.) in towns differing in size from 3000 to 600,000 inhabitants, 
he found this relation to be satisfied. 

Other deductive relations, applying to laboratory studies, must be 
omitted here for lack of time. 

Biological standardization. Some of my first experiences in bio- 
logical assays drew my attention to an inconsistency between what bio- 
logical standardization professed to do and the results of experiments. 

The fact had emerged from early experimental work that absolute 
measurements of a drug in ‘‘frog units’’ etc. were impossible due to the 
varying response of different strains of the same species of animal. 
The introduction of standards was intended to remove this source of 
error by making each measurement a comparison with a standard 
preparation, preferably an international one. To each unknown 
preparation was ascribed a ‘‘relative potency’’ expressed in terms of 
(international) units of the standard. 

It follows from well known considerations that if an wnambiguous 
relatwe potency can be derived from an experiment, then the reaction 
curves as plotted against the logarithms of the dose must be parallel. 
Experimental evidence is now available from widely different fields 
which shows that this condition is not always fulfilled (various sera, 
thyroxins, A-vitamins, tuberculins, etc.). 

The problem, however, goes much deeper. Even when the reaction 
curves are parallel, it is clear that an unambiguous statement about the 
relative potency of two substances requires that we get the same answer 
from different species of experimental animals as well as from differ- 
ent techniques applied to the same species. Neither of these condi- 
tions holds generally. Despite parallel reaction curves, Pedersen- 
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Bjergaard [8] has shown that the ‘‘relative potency’’ of estrone and 
of estradiol, for example, as determined in mice, rats and guinea-pigs 
varied from about 1:10 to 10:1. Similar results are known for peni- - 
cillin. 

The chemistry of antibiotics, hormones and vitamins is now so far 
advanced that the explanation of these facts is obvious. The prepara- 
tions compared consist of several chemical compounds, all biologically 
active but in different ways. If the compounds in two preparations 
were present in unequal proportions, all sorts of results could come out 
of various kinds of comparisons. 

The standardization of several medicaments is now chemical, but 
for many others we still have to rely on biological methods. The pres- 
ent criticism does not preclude the possibility of a biological control 
but I believe that the methods of biological standardization should be 
reconsidered. A single figure can no longer be taken as the relative 
potency of a substance unless it is chemically identical with the 
standard. 


BIOSTATISTICS 


Distributions. We have found it useful at the State Serum Insti- 
tute to study the distribution of biological data before applying the 
usual tests. One reason is technical. Consider, for example, a skew 
distribution which on some transformation (to logarithms, square roots 
or anything else) is normal. The specific parameters of the trans- 
formed distribution are the mean and the variance, but the mean and 
the variance of the original distribution are rather complicated func- © 
tions of these specific parameters. Hence, if two sets of original obser- 
vations are compared by the usual methods based upon the normal dis- 
tribution, characteristics may be lost which are obvious after trans- 
formation. 

The distribution as a whole is considered to be an important bio- 
logical datum in itself, which, in time, should be derived from general 
principles. Kapteyn’s distributions [ 6] have been derived from a gen- 
eral, although rather simple principle, viz., from stochastic differential 
equations. These are supposed to be approximate descriptions of the 
stochastic processes generating the distributions. We have been col- 
lecting data of different kinds which on transformation give normal 
distributions, with the intention of making a comprehensive study of 
these transformations. We would be especially grateful for sugges- 
tions which would enable us to enlarge this collection. 
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Subnormal variation. In microbiology it sometimes happens—as 
shown by Berkson in the case of erythrocyte counts—that the cells 
within a preparation are distributed more evenly than would be ex- 
pected if they were distributed independently and atrandom. A strik- 
ing example of reticulocyte counts was reported in Denmark [10]. 
Using a meticulous technique, C. M. Plum obtained routine counts from 
different suspensions of the same blood sample. On statistical analysis 
these showed a variance of about 9, independent of the mean count. 
Since the number of reticulocytes varied from 100 to 500 per 1000 
erythrocytes, the variance according to the binomial law should vary 
from 90 to 250. This result led to a cooperative study [4] which in- 
eluded a control experiment with photographic technique. Our data 
also showed an extraordinarily strong subnormal variation. We hope 
that experiments now in progress will provide an explanation. 

Experimental design. In his work on biological standardization, 
Ipsen realized that mortality rates sometimes are relatively inefficient 
measures of response. He therefore recorded the survival time of the 
animals that died and developed a procedure for handling such data. 
By this means he increased by four-fold the efficiency of an experiment 
on the standardization of tetanus serum [3]. 

In experiments by Engbaek [1], groups of mice were aA amounts 
of living culture of Bacillus Pfeiffer, type b, varying from 10*° to 10*° 
bacteria. In this case neither the mortality rate nor the survival time 
gave a satisfactory reaction curve. The multiplication of the bacteria 
and their dispersal through the blood would seem to bear directly upon 
the development of the disease. Accordingly we studied the curve of 
bacteriemia in 18 mice by means of blood samples taken at frequent 
intervals. Our results indicated that the number of bacteria per cc. 
of blood should be determined routinely four hours after inoculation. 
This gave a steep reaction curve representing an increase in efficiency 
of at least 1000 per cent. 

The above two examples emphasize the importance of a careful pre- 
liminary investigation for selecting a suitable experimental object. 


All references for Recent Biometric Devel- 
opments in Denmark are listed on page 189. 
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RECENT WORK ON ‘‘INCOMPLETE BLOCK DESIGNS” 
IN INDIA 


R. C. Bosz 


The main principles of ‘‘experimental design’’ as it is understood 
at present were developed by R. A. Fisher and his associates at Roth- 
amsted in England during the years immediately following the first 
world war. The three basic principles of ‘‘experimental design’’ are 
“‘randomization’’, ‘‘replication’’ and ‘‘local control’’. As Fisher has 
remarked, design and analysis are two aspects of the same logical whole, 
and randomization and replication are necessary in order that valid 
tests of significance based on the use of ‘‘t’’ and ‘‘z’’ tests should be 
possible. But to increase the efficiency of the experiment, ‘‘local con- 
trol’’ is necessary. The main principle is to divide the experimental 
material into relatively homogeneous parts called “‘blocks’’, and to so 
arrange the analysis that in making treatment comparisons ‘‘block 
effects’’ might be eliminated. Since any difference in the responses 
of the experimental units within the same block goes to swell the error 
(which serves as a measuring rod for the significance of the treatment 
differences) the experiment tends to become inefficient when the block 
size is large. We have, therefore, to find a way of reducing the block 
size. This has given rise to the device of ‘‘confounding’’ in the ease 
of factorial designs, and the use of ‘‘incomplete block designs’’ in the 
case of trials with a large number of varieties. I shall deal with some 
of the recent work done in India on the later class of designs. 

The incomplete block designs most commonly in use are the ‘‘lattice 
designs’’ and the ‘‘balanced incomplete block designs’’. In the latter 
“‘y”? treatments have to be arranged in ‘‘b’’ blocks of ‘‘k’? experi- 
mental units each, so that every treatment is replicated ‘‘r’’ times, and 
each pair of treatments occurs together in A blocks, (the number 2 
being independent of the pair of treatments we start with). It is 
readily seen that 

bk=vr, A(v—-1) =r(k-1) 


and Fisher has shown that 
b=v 


Given the parameters, v, b, r, k, A, satisfying the above conditions, 
the combinatorial problem of actually producing the required arrange- 
ment still remains. The practically useful cases are when r = 10, and 
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a number of these were listed in Fisher and Yates’ Tables. But many 
blanks remained. Indian workers have succeeded in filling up most 
of these blanks. Thus solutions have been found for the cases 


v=25, b=50, r= 8, k= 4, A=l 
Used, WehY, rei, is hy Waal 
Oar, OaGs, P= bh fhe dy Neal 
Har OSPsy Pa Os bs By Mes: 
V=16, b=24, r= 9, = @, ASE 
v=81, b=381, r=10, k=10, A=8 


Oa 2eebe 30210, b=) Te N=3 


V=22- b=22; r= 7, k= 7, A=2 
V=NG D=21 P=, k= 5. X=12 
y=29, b=29, r= 8 k= 8 A=2 
v=21, 0=28, r= 8, k= 6, A=2 


Some of these new solutions were included in the second edition of 
Fisher and Yates’ Tables and others are going to be included in the 
forthcoming edition. Only four cases with r=10 yet remain un- 
known. These are 


~=46, 6=46, r=10, k=10, A=2 
p= 36, b=45, r=10, k="8, X=2 
AGED 69 nT ee ii= On neat 
Oem, Wah, YS, he, yer! 


The total number of available balanced incomplete block designs is 
unfortunately very restricted. To a certain extent this deficiency is 
made good by Lattice designs in which, though every pair of treat- 
ments is not compared with the same accuracy, the analysis is simple 
and straightforward. In order to fit in with various experimental re- 
quirements which arise in practice, a much more flexible system of 
designs is required. Such a system can be obtained by the following 
considerations. It can be shown that any incomplete block experi- 
ment in which the block size & is constant, the effects can be estimated 
by solving the following system of normal equations 


Crbien Crataceee Ctola =i Piatra ces 


where v is the number of treatments and 9; is the ‘‘adjusted yield’”’ 
corresponding to the 7th treatment, (ie. 9; is the result obtained by 
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subtracting from the total yield of the 7th treatment, the average yield 
per plot of the blocks in which the treatment occurs). Also 


1 ri 
cu=n(1-2), c=- Fe, 


where 1; is the number of replications of the 7-th treatment, and di; is 
the number of blocks in which the 7-th and j-th treatments occur to- 
gether. In general, therefore, a system of v linear equations has to 
be solved, in order to estimate treatment effects. This is impracticable 
when v is large. We have, therefore, to impose combinatorial condi- 
tions, which should enable us to obtain an easy solution of the normal 
equations. One such set of conditions is the following: 


(1) Each block is of constant size and ees treatment is replicated 
the same number of times. 

(2) With respect to any treatment 6 the remaining ones can be 
divided into m groups of 1%, M2, ... %m each, so that treatments of the 
i-th group occur with the given treatment just A; times. (The numbers 
N1, N2,... Mm, Ar, Az, .. » Am are constants of the designs, and are inde- 
pendent of the treatments with which we start). The treatments of the 
i-th group are said to be 7-associates of the original treatment. 

(3) The relation of association is symmetrical. Thus if the treat- 
ment @ is an 7-associate of 9, then @ is an ‘-associate of 6. If pjx‘ is the 
number of treatments which are both j-associates of 0 and & associates 
of ”, then the number of p;z' is independent of the pair of 7-associates 
6 and © with which we start. 

Designs satisfying the above conditions have been called partially 
balanced incomplete block designs and include as special cases the 
‘balanced incomplete block designs’’ and the ‘‘lattice designs’’. The 
treatment differences are estimated with m different accuracies, and the 
solution of the normal equations can be made to depend on a solution 
of m simultaneous linear equations. Of course the practically useful 
cases are when m is small, say m = 3. 

Though a large number of illustrative cases have been worked out, 
a complete listing of the practically useful cases together with their 
combinatorial solutions remains to be undertaken. I am gratified to 
find that in the United States incomplete block designs, especially lat- 
tices, are. being used on a large scale, and hope that before my de- 
parture from this country some of the new designs found by Indian 
workers may be actually tested in the field. 
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BIOMETRIC WORK IN AUSTRALIA 
Mavricz BrLz 


In Australia the greater part of systematic biometric research is 
carried out by the Council for Scientific and Industrial Research 
through its various laboratories and institutions in the several states. 
This Council recruits its statisticians, in general, from University grad- 
uates who have had basic training in mathematics and, preferably, ad- 
vanced training in mathematical statistics. The Council passes these 
recruits through a preliminary period of practical training and then 
attaches them to the appropriate research department. 

University training in statistics at the undergraduate level is pro- 
vided in most of the Australian universities. In Melbourne, for ex- 
ample, there are three courses available: (1) a pass course in theory of 
statistics, taken in the second year and following a fairly complete first 
year course of calculus, in which the standard distributions and tests 
are fully described and illustrated; (2) an honor course in mathe- 
matical statistics, taken either by mathematical graduates or by honor 
undergraduates who have completed a course in the theory of func- 
tions ; (3) a descriptive course in statistical methods for research work- 
ers, mainly graduates in bacteriology, biochemistry, pharmacology, 
physiology, ete., many of whom have had a basic training in algebra 
and the elements of calculus. This training is usually acquired at 
school, a special subject to meet the needs of such people being included 
among the matriculation group of subjects. All the above courses are 
increasing steadily in popularity. 


BIOMETRIC DEVELOPMENTS IN THE NETHERLANDS 
M. G. NeEuRDENBURG 


I regret that I am unable to review the situation of the statistical 
field in the Netherlands. The main purpose of my visit is to learn, 
instead, of the things you are doing in the United States. As a med- 
ical man, I would underline the necessity of thinking biologically. 

Statistics on infant welfare, hospitalization, ete., are basic for the 
organization of practical work such as of social security. Agricul- 
turalists, too, are working seriously on statistics in Holland. 
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DISCUSSION 


D. Nanda. I shall give a brief view of the plant-breeding work 
which is being done in India, much of it by Dr. V. G. Panse at the 
Institute of Plant Industry at Indore. 

Plant-breeding requires a study of the variability in the existing 
material. The fact that environmental variability is not of use to the 
plant-breeder and that only genetic variability leads to the improve- 
ment of a crop requires a separation of the genetic and non-genetic 
variation. The regression of the progeny means upon the parental 
values has been determined to give the proportion of the genetic to the 
total variation. This method is often used as a measure of the genetic 
variability and if the material is rich enough, the breeder initiates some 
selection. The selection program is carried out mostly with the help 
of the progeny row technique. 

If the material lacks genetic variability, the breeder hybridizes to 
create variability, The progenies from different crosses are quite 
heterogeneous due to the different combination of genes. The vari- 
ability so created is both genetic and non-genetic. If the material is 
grown in a randomized and replicated trial the variation can be sepa- 
rated into different parts. Such a study also reveals the effects of 
hybrid vigor and heterosis. The equations for this problem were given 
by Professor Fisher when he visited India in 1938. Further, several 
experiments are being performed to compare the yielding capacity of 
pure lines and crosses at various stages of propagation; ie. Fi, F'2, F's, 
ete. 

Another technique in common use is to select material on the basis 
of a discriminant function. This method was given by Fairfield 
Smith and is being developed further by Panse. The genetic advance 
achieved by selecting a certain percentage on the basis of discriminant 
function is subject to sampling error. Bartlett gave a method in 1939 
(Supplement to The Journal of the Royal Statistical Society) for 
determining the standard error of the genetic advance. I have tried 
to develop this theory further and have obtained the standard errors 
of the genetic advance and of the regression coefficients of the dis- 
eriminant function model. 


N. Rashevsky. I wish to review briefly the discussions of this after- 
noon’s session. Our colleagues from the Old World have done amazing 
work in their respective countries and we can only bow in admiration 
to them for having achieved so much under such adverse conditions. 
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Let us look, however, at the scope of the papers presented and dis- 
cussed today. They are all strictly statistical in their nature. But 
an application of mathematics to any natural phenomenon goes far 
beyond the application of statistical method. The criterion of any sci- 
entific achievement is predictability. Frequently by the use of statis- 
tics we can predict rather precisely the future course of some events, 
just as by observing carefully the motion of a golf ball we might pre- 
dict whether it will land in a hole or not. But you all will agree that 
there is a difference between this type of prediction and those exem- 
plified by Maxwell’s prediction of electromagnetic waves or by Hin- 
stein’s prediction of the transmutability of matter into energy. These 
latter predictions are made by first formulating some quantitative 
hypotheses concerning the intimate mechanisms underlying a set of 
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Figure 1 
observable phenomena. By means of mathematical analysis conse- 
quences are drawn from this set of hypotheses and compared with 
available data. If the hypothesis is a good one, not only will its con- 
sequences agree with known facts, but they will lead to the prediction 
of new hitherto unknown phenomena. 

I wish to illustrate with a few figures the application of this method 
to biology, as it has been carried out in the last twelve years at the 
University of Chicago, although time does not permit me to explain 
the hypotheses. By making certain assumptions about the mechanism 
underlying cell division, mathematical equations describing the elonga- 
tion and constriction of freely dividing cells have been derived by Pro- 
fessor H. D. Landahl. To test those equations experiments were made 
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by Buchsbaum and Williamson here at Woods Hole. A typical case of 
agreement between the predicted course of the phenomenon and that 
observed is shown in Figure 1. I wish to emphasize that what you see 
here is not an empirical fitting by statistical methods of a curve to an 
observed set of points. The analytical expression for these curves was 
obtained long before the experiment was planned. 

Professor Landahl and others have also contributed to the theo- 
retical study of the functions of the central nervous system. By mak- 
ing some rather complex assumptions about the nature of interaction 
of the neurons in the brain, conclusions can be drawn, for instance, as 
to the dependence of the reaction time to a stimulus on the interval 


3 


nae 


REACTION TIME IN SECONDS t-- 


12 6 20 a 
PREPARATORY INTERVAL IN SECONDS ty —> 


Figure 2 


between that stimulus and a ‘‘preparatory’’ stimulus given as a warn- 
ing in advance. The comparison of the theory and the experiment, 
taken also from Landahl’s publication, is shown in Figure 2. 

Figure 3 shows the agreement between theory and experiment on 
the relation between the intensity of a stimulus and the just noticeable 
difference between two stimuli which still can be discriminated. These 
are typical curves taken from a series of papers by Householder. 

I have no time to discuss a number of other important contributions 
in the field of mathematical biophysics such as Landahl’s studies on 
learning, Weinberg’s work on excitation, Williamson’s work on elec- 
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trical properties of cells, Bloch’s work on permeability, Sacher’s work 

on periodicities in neural circuits, Morales’ work on gas exchanges, and 
many others. I also cannot mention the innumerable questions to 
which we still have no answers. However small our knowledge and 
however great our ignorance, I repeat that only through the har- 
monious simultaneous development of all possible approaches to mathe- 
matical biology shall future progress be made. 
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Saturday morning, September 6 


This session, under the chairmanship of Drs. Teissier and Dieule- 
fait, dealt with international cooperation in Biometrics. The first 
speaker was Dr. D. W. Bronk of the National Research Council. Dr. 
Bronk emphasized the importance of free communication between the 
scientists for all nations in establishing a stable peace, and discussed 
the various forms of international organization which have been 
adopted in the past. / 

Dr. Dieulefait then spoke of the reorganization of the International 
Statistical Institute. Under its new statutes it would include not only 
governmental and economic statistics but also mathematical statistics 
and its applications. He suggested that the society which might be 
formed this morning should consider affiliation with L§.I. 

Professor R. A. Fisher then submitted the report of the Committee 
on International Organization, which recommended the formation of 
an international membership society. A draft constitution was dis- 
tributed in mimeographed form and considered article by article under 
the chairmanship of Dr. Maurice Belz. 

The inclusion of the word ‘‘international’’ in the name of the 
Society was debated first. It was finally agreed that the name of the 
organization shall be the ‘‘ Biometric Society’’ and that a short descrip- 
tion clarifying the objectives of the Society shall appear on its letter- 
heads. After further small modifications, the draft constitution was 
adopted. 

It was then resolved that all persons present at the Conference, or 
invited but unable to attend, shall be Charter Members of the Biometric 
Society if they so desire. It was further resolved that the Committee 
on International Organization and such other Members as it may 
designate shall constitute the first Council of the Society. 

The Conference, then sitting as the Biometric Society, unanimously 
adopted the constitution as amended during the morning session. A 
copy of this constitution follows. 
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THE BIOMETRIC SOCIETY—CONSTITUTION 
1. Scope of the Society 


The Biometric Society is an international society for the advance- 
ment of quantitative biological science through the development of 
quantitative theories and the application, development and dissemi- 
nation of effective mathematical and statistical techniques. To this 
end the society welcomes to membership biologists, mathematicians, 
statisticians and others interested in applying similar techniques. 


2. Members 


The Society shall have one class of members. ‘To become a member, 
a person must be proposed by two members of the Society and ap- 
proved by the Council. The Council may delegate this authority. The 
members represent the highest authority of the Society. The Council 
shall consult them on any vital questions that may effect the policy of 
the Society as a whole, obtaining their decision by a mail vote. 


3. Officers 


The general officers of the Society shall be the President, the Secre- 
tary and the Treasurer. The regional officers shall be the Vice-Presi- 
dents, Regional Secretaries and Regional Treasurers. The officers shall 
be elected by the Council for one year. 

The President shall act as chairman of the Council. 

Each Vice-President shall represent a specified region. The regions 
shall be determined by the Council from time to time. 

The Treasurer shall present financial statements to the Council and 
shall bring condensed statements to the attention of the members. The 
offices of Secretary and Treasurer may be combined. 

No President or Vice-President shall serve more than two consecu- 
tive years. 

The Council shall elect Regional Committees to serve under the 
chairmanship of each Vice-President and may elect Regional Secre- 
taries and Regional Treasurers where that seems advisable. 

The general and regional officers, acting as a nominating committee, 
shall submit to the Council a list of candidates for general and regional 
officers but members of the Council may vote for names not on this list. 
Regional officers and committeemen may be nominated directly by the 
members at regional business meetings. 
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4. The Council 


The President, the Vice-Presidents, the Secretary, and the Trea- 
surer shall be ex-officio members of the Council. There shall not be 
less than twelve nor more than twenty ordinary members of the Coun- 
cil, who shall be elected with a view to representing the various geo- 
graphic areas and fields of activity in which the Society has members. 
Ordinary members of the Council shall be elected for three years by a 
mail vote of the members, the terms of approximately one-third of them 
terminating each year. 

The general and regional officers of the Society, acting as a nomi- 
nating committee, shall submit to the members a list of names of twice 
the number of candidates necessary to fill the ordinary vacancies in the 
Council, but members may vote for names not on this list. No ordi- 
nary member of the Council shall serve more than two consecutive 
terms of three years each, 

In matters of purely regional importance, the appropriate Regional 
Committee may act for the Council, but the Council may reverse its 
action. 

5. Activities 


Any activities which fall within the sphere of interest of the Society 
may be authorized by the Council, such as international and local sci- 
entific meetings and the issuance of publications reporting the activities 
of the Society or containing other matters of biometric interest. 

The Society may affiliate itself with international organizations. 
With Council approval a Region may affiliate itself with regional or 
national organizations. 


6. Financial Orgamzations 
The dues for members shall be fixed by the Council. Requests and 
gifts may be received. 
7. Amendment of the Constitution 


Amendments to the Constitution must be approved by the Council, 
or in writing by at least five per cent of the members, and before be- 
coming effective they must be ratified by a two-thirds majority of those 
voting in a mail vote taken among all members. 


Saturday afternoon, September 6 


The Chairman, Leslie F. Nims, called upon Professor Wilson, who 
presented the following resolutions. 


“The members of the First International Biometric Conference held 
at Woods Hole, September 5-6, 1947, desire to express their profound 
thanks to the Marine Biological Laboratory, to the Rockefeller Founda- 
tion, to the Connecticut Agricultural Experiment Station, to the National 
Research Council and to the Officers and Organizing Committee of the 
Conference for providing facilities and making arrangements which have 
resulted in an initial Conference of great present success and high hope 
for the future.” 


It was voted unanimously that the Secretary be directed to transmit 
this resolution to all parties concerned. The Chairman then intro- 
duced Professor Teissier, who gave his paper* on ‘‘La Relation d’Allo- 
metrie, sa Signification Statistique et sa Logique’’. Dr. J. Monod 
opened the discussion by summarizing Dr. Teissier’s address in English. 

The First International Biometric Conference was concluded with 
the following remarks by Professor Teissier. 


ALLOCUTION DE CLOTURE DE LA PREMIERE CONFER- 
ENCE INTERNATIONALE DE BIOMETRIE 


GEORGES TEISSIER 


Avant que nous nous séparions, je voudrais, mes chers collégues, 
vous remercier encore de l’honneur que vous m’avez fait en me 
chargeant de présider vos débats, et vous renouveler mes excuses pour 
le retard bien involontaire qui m’a privé du plaisir d’ouvrir votre con- 
férence et d’entendre 1’exposé du Professeur Fisher. 

Je voudrais aussi remercier en votre nom a tous ceux qui ont eu 
l’initiative de cette réunion et qui ont réussi 4 l’organiser si heureu- 
sement. CO’est tout particuliérement au Dr. Bliss que va notre recon- 
naissance. Nous savions, dés longtemps, ce que lui doit la Biométrie 
et je ne suis pas prés d’oublier combien ses méthodes de calcul m’ont 
été personnellement utiles en de certaines circonstances. Nous sommes 
maintenant quelques dizaines de biométriciens 4 savoir, en outre, qu’il 
est aussi un organisateur habile et le plus sympathique des hétes. Qu/’il 
recoive ici publiquement |’assurance de notre gratitude 4 tous. 


* This paper and the ensuing discussion will be published in the March issue off 
Biometrics. 
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Puisque les fonctions que vous avez bien voulu m’attribuer me 
donnent le droit de parler en dernier, je me permettrai de vous 
présenter pour conclure quelques réflexions sur notre nouvelle asso- 
ciation. 

Au point ott en sont arrivées, dans leur développement, la Statis- 
tique et la Biologie, je crois qu-une tentative telle que celle que nous 
réunit ici était devenue nécessaire, mais je crois aussi que |’entreprise, 
dans laquelle nous nous lancons d’un méme élan, est difficile. Elle ne 
réussira pleinement que si, dés le début, nous avons une conscience 
claire de la nature exacte de la tache qui nous attend. 

Notre Société ne doit pas étre une nouvelle association de statisti- 
ciens et pas davantage une nouvelle association de biologistes. Nous 
appartenons les uns et les autres a de telles sociétés et nous continue- 
rons, sans doute, 4 participer de notre mieux 4 leurs travaux. Dans la 
Société de Biométrie, il nous faudra travailler ensemble a faire progres- 
ser la Biologie par la Statistique. Pour cela, il nous faudra d’abord 
bien nous comprendre. Certains d’entre nous ont surtout une forma- 
tion mathématique et connaissant assez mal les problémes biologiques; 
d’autres surtout biologistes n’ont pas poussé bien loin leurs études 
mathématiques. Bien peu nombreux sont ceux qui ont eu le bonheur 
de bénéficier, dés leur jeunesse, de cette double culture mathématique 
et biologique qui apparait cependant comme essentielle 4 la pleine 
compréhension de la science que nous cultivons. 

Il nous faudra done—et ce devra étre notre premier objectif—nous 
instruire mutuellement, le statisticien apprenant au contact du biolo- 
giste quelles sont les préoccupations dominantes de celui-ci, le biologiste 
apprenant de son collégue statisticien ]’art difficile de poser correcte- 
ment un probléme quantitatif. I] nous faudra ensuite recruter, parmi 
les jeunes, des futurs biométriciens que nous devrons former, mieux 
que nous ne l’avons nous-méme été, 4 cette passionnante mais sévére 
discipline. 

Le fait que nous soyons réunis aujourd-hui, si nombreux et venus 
de tant d’horizons divers, prouve que nous avons compris combien était 
utile au progrés de la science, cette confrontation d’hommes d’origine 
et de formation dissemblables, mais épris du méme désir de connais- 
sance désintéressée. Nous pouvons bien augurer de notre avenir 
commun, si nous sayons intéresser un plus grand nombre de biologistes 
a nos efforts et si nous savons leur montrer quel merveilleux secours 
apporte 4 1’étude des phénoménes de la vie, ]’instrument mathématique 
‘sagement manié. 
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La double organisation de notre association, a le fois internationale 
et régionale, nous aidera 4 assurer ce recrutement et c’est avee con- 
fiance que nous pouvons lancer & travers le monde la nouvelle Société 
de Biométrie. 


(1] 
[2] 


[3] 


[4] 
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THE BIOMETRIC SOCIETY 
REPORT OF COUNCIL MEETINGS 


First meeting of the Cowncil, Woods Hole, September 6, 1947, with 
the following present: Belz, Bliss, Bose, Fisher, Hopkins, Linder, 
Teissier, Rasch and Tukey with Monod as interpreter and guest. As 
the first order of business, Professors Cox, Haldane and Wilson were 
elected members of the Council. Since she was still in Woods Hole, 
Miss Cox joined the Council for the remainder of the meeting. As its 
first officers the Council elected R. A. Fisher, President; J. W. Hopkins, 
Treasurer; and C. I. Bliss, Secretary. 

The publication of the Proceedings of the Woods Hole Conference 
and the policy of the Society in respect to a journal were discussed. 
As editor of Biometrics, Miss Cox offered to publish the Proceedings 
in the December issue (1947) and the two main papers by Professors 
Fisher and Teissier with the ensuing discussions in the March 1948 
issue. After considering an alternative proposal of The American 
Naturalist, the Council agreed to accept Miss Cox’s offer. It was 
agreed that the papers from the Proceedings should be published in 
the language in which they were delivered. 

Members of the Council favored an arrangement with the American 
Statistical Association, by which Biometrics could be used as the offi- 
cial journal of the Biometric Society during 1948. It was agreed that 
dues should inelude a subscription to Biometrics, except that the mem- 
bers already receiving the journal through the Biometrics Section 
would receive a rebate. Before adjourning, the Council agreed to 
meet next in Washington. 

Second meeting of the Council, Washington, D. C., September 15,. 
1947, with the following present; Belz, Bliss, Bose, Cox, Dieulefait, 
Fisher, Linder, Neurdenburg, Rasch and Tukey. The following were 
elected to the Council: Buzzati-Traverso, Cole, Demeree, Goulden, 
Johnson, Needham and Simpson. 

It was decided to set up a skeleton regional organization but allow 
members the option of belonging to a given region or to be members at 
large. Four regions were set up with officers: British Region, Indian 
Region, Western American Region (including the American Pacific 
coast, western Canada and Mexico) and Eastern American Region (in- 
cluding eastern Canada). The following five regions were to be acti- 
vated as soon as it seemed desirable: Scandinavian Region (Denmark, 
Norway, Sweden and Finland), Benelux Region (Belgium, Netherlands 
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and Luxemburg), Australian Region, French Region and Russian 
Region. Each member of the Council agreed to handle publicity for 
the new Society in his region. 

As an incentive to membership it was decided to consider everyone 
joining before February 1, 1948, as a charter member of the Society. 
The Secretary was given instructions by the Council, which would 
facilitate the election of additional charter members. 

In accord with Article 5 of the Constitution, the Council approved 
affiliation of the Society with the International Council of Scientific 
Unions if such could be arranged. President Fisher was authorized 
to review the matter with the General Secretary of the I.C.8.U. The 
Council also’ approved affiliation with the International Statistical In- 
stitute, when its statutes make this possible. The Hastern American 
Region was encouraged to affiliate with the National Research Council’s 
Division of Biology and Agriculture, with the American Association 
for the Advancement of Science and with the American Statistical 
Association. 

In a discussion of finances the Secretary reported that a balance of 
possibly $300 from the Rockefeller grant in support of the Woods 
Hole Conference would be available for printing and distributing the 
Proceedings of the meetings of September 5th and 6th but not for the 
use of the new Society. The Board of Directors of the American 
Statistical Association has offered to spend up to $100 for printing 
extra copies of the December issue of Biometrics for distribution to 
prospective members of the Biometric Society who would become sub- 
seribers. In consideration of the probable cost of Biometrics to the 
Society in bulk subscriptions, the Council voted dues of $4.00 for 1948 
with a rebate of $2.00 to those who are also members of the Biometrics 
Section. The policy was approved of having dues collected by the 
regional secretary-treasurers who would be authorized to retain $1.00 
or its equivalent for regional use. In view of the limitations upon 
international exchange, dues are to be payable in local currencies 
except in an amount sufficient to cover a subscription to Biometrics, 
which must be received in dollars before the journal can be mailed. 

The letterhead of the Society is required to carry a descriptive 
clause beneath its name. After some discussion, the Council decided 
on the wording ‘‘ An International Society Devoted to the Mathemati- 
cal and Statistical Aspects of Biology.’’ 

C. I. Buiss 
Secretary 
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ABSTRACTS 


(42) 

UPHOLT, WILLIAM M. AND SCUDDER, H. I. (U. S. Public 
Health Service.) A Problem in Sampling Mobile Insect Populations. 

Some insect population distributions are so extremely ‘‘con- 
tagious’’ that a reasonable number of random samples does not 
provide a satisfactory estimate of the mean. In certain cases such 
as house flies the insect itself is so highly mobile that no satisfactory 
method of obtaining a truly random sample has been devised. To 
meet this situation one of the authors has suggested a sampling 
method which apparently provides an estimate of the magnitude of 
the greatest concentrations of flies. These estimates appear to be 
reproducible and to have a relatively small, though undetermined, 
variance. The problem presented is the statistical validity of, and 
the selection of proper statistical techniques to use with, such 
samples which appear to be empirically justified from the stand- 
point of field biologists. 


(43) 

COOK, S. F. (U. C. L. A.). Survivorship in Aboriginal Populations. 

Survivorship in truly aboriginal populations can be determined 
by only two methods: (1) censuses taken by members of an invad- 
ing culture prior to any disturbance caused by such invasion and 
(2) determinations of the age at death of skeletal remains. It has 
been possible to assemble data concerning several such populations, 
including Tasmanians, East Africans and a number of North 
American aboriginal groups. The mean age at death varies widely. 
‘Thus the Pecos Indians (Hooton) and those of the Valley of Mexico 
appear to have survived to a relatively old age whereas the Tasma- 
nians (Todd) and the California Indian tribes tended to die at a 
very early age. The conclusion is reached that survivorship is not 
an inherent or genetic characteristic but is dependent purely upon 
environmental conditions and habits of life. Moreover the data 
studied appear to indicate that in peoples wholly untouched by out- 
side influences infant mortality is less than has been’ commonly sup- 
posed. 
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QUERIES 


(54) 
QUERY: This is purely a computational matter, but one which has 
bothered me frequently, and upon which I should like to get ad- 
ditional information if there is any available. In many cealcula- 
tions I multiply a series of quanties x, %, %3, ... %n by a con- 
stant factor k, rounding the individual products to the nearest 
integer. As a check I compare Skx (rounded) with k(Sz). 

Now the difficulty is that I have no idea of the expected distribu- 
tion of the difference Ska—k(Sx), beyond the obvious facts that it 
has minimum 0 and maximum 7/2 (disregarding signs). Through 
long experience I have arrived at the conclusion that where n is 
around 20-50, a value of Ska—k(Sx) of over 2 usually indicates an 
error in one or more of the individual products. I should like to 
treat the matter more exactly if possible, and wondered if you were 
aware of anything on the subject in the various works for com- 
puters, or elsewhere. 


ANSWER: Your problem is that of the distribution of means (or 
totals) of samples from a rectangular population. This distribu- 
tion is discussed in The Advanced Theory of Statistics, Yoru up 
by M. G. Kendall on pages 240-242: 
If we denote the total of sample values from this distribution by 
z, the distribution of z has the mean zero and the variance n/12. 
Since the distribution of z tends to normality with increasing n, for 
moderately large values of m we may ascribe to a total a standard 
error of \/n/12. The difference between S(kax) and k(Sx) should 
exceed twice this standard error in only about 5 percent of cases. 
If n is, for example 50 the standard error of the total will be about 
2 (that is, \/50/12). If the discrepancy is greater than 4, there is 
good reason to suppose that some error of calculation has been made. 
Exact levels of significance have not so far as I know been evaluated, 
but the above rule should be reasonably accurate. 
Oscar KEMPTHORNE 
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(55) 

QUERY: This is to request information as to the applicability of 
the analysis of variance procedures and the F-test to the studying of 
the significances between classifications of variance rather than 
means. That is, in a table of variances (S?) as follows, could the 
variance of the variances by rows, by columns, and a residual (in- 
teraction) be computed following the usual procedure and then by 
the F-test could the significance of the differences in.the mean vari- 
ances for the columns or rows be determined ? 


2 2 2 
8 LL 8 21 8 31 
S712 S700 8? 39 
S713 8o3 S723 


ANSWER: Bartlett and Kendall discuss this problem in ‘‘The 
Statistical Analysis of Variance—Heterogeneity and the Logarith- 
mic Transformation’’, Supplement to the Journal of the Royal 
Statistical Society, Vol. 8, pages 128-138, 1946. If you have inde- 
pendent samples of approximately the same size, if they are drawn 
from normally distributed populations, and if they have ten or more 
degrees of freedom, you can transform to the logarithms of the 
variances and analyze the variance of the resulting logarithms in 
the usual manner. Even if the degrees of freedom in the samples 
are as few as five, the method may be used tentatively. 

If you need information as to why transformation is necessary, 
you will find excellent, discussions of analysis of variance by Hisen- 
hart, Cochran, and Bartlett in the March issue of Biometrics. 

Other articles describing approximate tests of homogeneity of 
variances are as follows: Stevens in the Journal of Genetics, Vol. 33, 
page 398; Cochran in the Supplement to the Journal of the Royal 
Statistical Society, Vol. 4, page 102; and Bartlett in the same Jour- 
nal, Vol. 4, page 187. 

WALTER T. FEDERER 
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NEWS AND NOTES 


G. E. DICKERSON, formerly with the Regional Swine Breeding 
Laboratory at Ames, Iowa, has a new address—Animal Husbandry 
Department, University of Missouri, Columbia. He is in charge of 
research and teaching in animal breeding. Animal breeding there 
includes animal genetics and physiology, particularly physiological 
reproduction. He hopes to be conducting genetic studies of farm 
and laboratory animals... JAMES H. BYWATERS, Virginia Agri- 
cultural Experiment Station, Blacksburg, expects to plan and con- 
duct poultry research projects aimed toward development and guid- 
ance of the poultry industry in Virginia. Welcome to the South! 
Mr. Bywaters comes to Virginia from the U. S. Department of Agri- 
culture, Poultry Research Laboratory at East Lansing, Michigan 

. J. F. Kaay, Agricultural Research Laboratory, The Dow 
Chemical Company, Seal Beach, California, writes, ‘‘It is true that 
most of our populations are skewed and if we were to obtain ac- 
curate evaluations of our experimental errors and treatment dif- 
ferences, transformations would have to be used. We are greatly 
in need of suggestions and help concerning the problem of experi- 
mental design. This problem is the single greatest difficulty with 
which many experienced research men are confronted. The tend- 
enecy is to make experiments too complicated.’’ Quite true. We 
hope to present a series of articles dealing with the designing of 
experiments in the near future. There have been numerous re- 
quests for illustrations from actual experiments ... ANTONIO E. 
MARINO, Sarmiento 246, Buenos Aires, Argentina, writes, ‘‘I remem- 
ber the nice days I spent at Raleigh in 1941. At present I am work- 
ing with the Argentine Seed Branch of Cargill Incorporated as a 
plant breeder, specially for developing corn hybrids for this coun- 
try.’’ . . . FERNANDO VILLAMIL-GARcIA, Estacion Agricola Experi- 
mental, Palmira-Valle, Colombia, South America, has had a few 
*‘solos’’ into the statistical world even though he struggled a little 
when taking Mathematics 441 and 442 with Professor Snedecor! 

. F, W. PRESTON, Preston Laboratories, Butler, Pennsylvania, 
States, ‘‘although we are not concerned very much with statistics as 
applied to biology, we find that biologists tend to develop branches 
of statistics that are useful to us in glass technology. Here, unlike 
most branches of physics, we run into wide fluctuations of our 
measured quantities, and the simpler forms of statistics are not 
always adequate’?...u. c. BircH, Australian visiting research 
worker at the Bureau of Animal Population, Oxford University, 
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thinks that Biometrics can continue to serve its most useful function 
by summarizing some of the new developments in statistics. . . 

H. P. DONALD, Institute of Animal Genetics, University of Edin- 
burgh, requests that Biometrics contain papers which deal with 
practical problems in a way intelligible to people with little mathe- 
matical skill. We would like to meet these objectives, but where are 
enough biological statisticians who can take time to write for 
Biometrics! . .. In discussing some of these points ERIC ©. WOOD, 
Virol Limited, Ealing, England, recently made a profound state- 
ment in a letter to the editor. ‘‘An experimental design which is 
both sound in theory and convenient in practice is usually evolved 
only by two people, a statistician and a practising assayist in col- 
laboration.’’ . . . GEORG w. HARVEY has been appointed director of 
the Statistical Division of the newly organized National Blood Pro- ° 
gram of the American Red Cross. He will institute modern pro- 
cedures for the collection and analysis of relevant data, pertaining 
to blood collection and the fractionation of blood. This new pro- 
gram will provide blood and blood derivatives throughout the 
nation to help save lives, prevent needless suffering, and to further 
research. . . . ERRETT ©, ALBRITTON, Professor of Physiology, The 
George Washington University School of Medicine, Washington, 
D. C., feels that some periodical might give space to the analyses of 
problems taken from current literature. He states, ‘‘The medical 
investigator has nowhere available, as a model for his own thinking, 
such explicit logical analyses of the research of others.’’ A statis- 
tical review of medical research sounds difficult! ...F. E. ALLISON, 
Senior Chemist, Bureau of Plant Industry, Beltsville, Maryland, 
has expressed an idea we have heard from others. ‘‘I might add 
that usually a research worker must be his own statistician because 
there is no one near with whom he ean consult. Often he is greatly 
puzzled as to what procedure to follow. A book that gives a num- 
ber of typical experimental designs, suitable for various types of 
problems, and sets forth in detail the methods of calculation to be 
used, together with the logical conclusions to be drawn from the 
data after analysis, would, it seems to me, be of very great value. 
The information is now available, of course, but not readily, and 
certainly not in a single volume. Such a book on experimental 
designs and analytical procedures should not be limited to analysis 
of variance methods but should include other common procedures of 
use in obtaining or analyzing data. To be of most use the book 
should be as simple and non-mathematical as possible.’ ...F. xX. 
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LAUBSCHER, senior research officer, College of Agriculture and Ex- 
periment Station, Potchefstroom, Union of South Africa, is in this 
country on a commission to investigate fiber production and to visit 
experiment stations where corn breeding is being done. At Raleigh, 
he expressed interest in the statistical approaches to breeding tech- 
niques of peanuts, soybeans and small grains. . . . The Statistical 
Section of the Tennessee Public Health Association met May 6 (its 
first anniversary) in the Andrew Jackson Hotel, Nashville. They 
held both a morning and afternoon session with the following program: 
Morning Session—9:00 A.M.—1. Program of the Medical Research 
Statistics Division of the Veterans Administration—PaUL M. DENSEN, 
Chief of Division of Medical Research Statistics, Veterans Adminis- 
tration. 2. Statistical Needs of Social Workers—LoRA LEE PEDERSON, 
Director of Nashville School of Social Work. 3. Fisher’s Exact Sig- 
nificance Test for the Fourfold table—marGcaRET MARTIN, Assistant 
Professor of Preventive Medicine and Public Health, Vanderbilt 
University School of Medicine. *4. Statistical Training Program of 
Tennessee Department of Public Health—sara Lou HATCHER, Statis- 
tician, Tennessee Department of Public Health. 5. Completeness of 
Data on Birth Certificates—p. c. PETERSON, Director of Division of 
Vital Statistics, Tennessee Department of Public Health. Afternoon 
Session—2 :00 P.M.—1. Public Health Statistical Service in Mississippi 
—MARGARET E. RICE, Supervisor of Public Health Statistics, Mississippi 
State Board of Health. *2. Source of Report of Notifiable Diseases— 
MRS. THOMAS PARRISH, Statistical Aide, Tennessee Department of 
Public Health. *3. Study of Reported Cases of Tuberculosis—LoLa 
MAI UPCHURCH, Statisician, Tennessee Department of Public Health. 
4. Observations Regarding Health Work in Chile—ruTH R. PUFFER, 
Director of Statistical Service, Tennessee Department of Public Health. 
Paul Densen presided. Mimeographed copies of the three papers with 
an asterisk are available upon request from the Tennessee State Depart- 
ment of Public Health. Thirty-four persons attended. The following 
officers were elected for the coming year: Chairman, ANN DILLON, 
Tennessee Department of Public Health ; Vice Chairman, HELEN CHAND- 
LER, Tennessee Valley Authority, Wilson Dam, Alabama; and Sec- 
retary, MARGARET MARTIN. 
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